Take-home Exercise 02

Author

Andrea Yeo

Published

February 24, 2025

Modified

March 8, 2025

Be Tradewise or Otherwise

All values are in Million Dollars

1. Overview

1.1 Background

Since Mr. Donald Trump assumed office as the President of the United States on January 20, 2025, global trade has been a highly scrutinized topic. Given Singapore’s role as a key global trade hub, understanding its trade dynamics is crucial in assessing the potential impact of shifting geopolitical and economic policies.

In this take-home exercise, we apply newly acquired data visualization and analytical techniques to explore Singapore’s Trade in Services - Singapore’s exports and imports of services between Singapore and the rest of the world. By leveraging statistical tools and visualization methods, this study aims to uncover key trends, service category distributions, and trade imbalances while providing deeper insights into how different service sectors contribute to Singapore’s economy.

1.2 Research objective

Using data from the Department of Statistics Singapore, DOS on Trade In Services By Services Category, this analysis applies Exploratory Data (EDA) and data visualization techniques to:

  • Assess three existing visualizations from this page, identifying each of these visualization pros and cons.
  • Redesign and enhance these visualizations using ggplot2 and other R packages to improve data interpretation, visual appeal, and accessibility.
  • Conduct time-series analysis and forecasting to identify key trends in Singapore’s trade in services, evaluate growth patterns across service categories, and predict potential future trade movements.

2. Getting started

2.1 Loading the packages

The following R packages will be loaded for this exercise using pacman::p_load():

  • readxl: Reads Excel files (.xls and .xlsx) into R
  • writexl: Writes data frames to Excel files (.xlsx).
  • tidyverse: A collection of packages for data manipulation, visualization, and modeling.
  • timetk: Provides time series analysis and forecasting tools.
  • forecast: Implements forecasting methods like ARIMA, ETS, and more.
  • ggplot2: A powerful package for data visualization using the grammar of graphics.
  • plotly: Creates interactive plots, including 3D and web-based visualizations.
  • treemapify: Enables the creation of treemap visualizations in ggplot2.
  • patchwork: Helps combine multiple ggplot2 plots into a single layout.
  • dplyr: Provides fast and intuitive data manipulation functions.
  • CGPfunctions: Offers additional plotting and visualization functions.
  • rnaturalearth: Provides world map data for geographic visualizations.
  • rnaturalearthdata: Supplies natural Earth vector data for spatial analysis.
  • sf: Supports simple features for spatial data (GIS).
  • viridis: Provides color palettes for better visualization accessibility.
  • ggrepel: Improves text labeling in ggplot2 by avoiding overlapping labels.
  • ggHoriPlot: Creates horizontal bar plots for ggplot2.
  • ggthemes: Adds additional themes and styles for ggplot2.
  • tidymodels: A collection of packages for machine learning and modeling.
  • modeltime: A framework for time series forecasting using machine learning and statistical models.
Code
pacman::p_load(readxl, openxlsx, data.table, tidyverse, timetk, plotly, forecast, ggplot2, CGPfunctions, rnaturalearth, rnaturalearthdata, sf, viridis, ggrepel, ggHoriPlot, ggthemes, tidymodels, timetk, modeltime)

3. DataVis makeover 1

3.1 Makeover of - Overall Exports and Import of Services

3.1.1 Original visualization

The original visualizations analyzed in this analysis are sourced from here.

3.1.2 Evaluation of the original visualization

In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) clarity, and (2) visual appeal:

  • Clarity: How well the data is presented and understood
Pros Cons Suggested fixes
Clear categorization of exports and imports - The chart effectively differentiates between exports and imports Year-specific colors make trend analysis difficult – Each year is assigned a different color, making it hard to track trends across time. Use consistent colors for exports and imports across all years (Implemented by converting them into line charts).
Total trade values are highlighted – The total trade (Exports + Imports) is displayed for each year. Floating “Total” labels can be easily overlooked – They are placed above bars separately, which may lead to misinterpretation. Introduce a line chart for total trade trends instead of floating labels.
Growth rate (CAGR) and trade balance data are included – Additional insights are provided. Trade balance is not well integrated – It is displayed separately at the bottom instead of being visually linked to the bars. Represent trade balance directly as a bar chart (Green for surplus, Red for deficit)
  • Visual appeal: How visually engaging and effective the design is
Pros Cons Suggested fixes
Engaging use of colors and icons – Makes the visualization appealing and eye-catching. Overuse of colors creates clutter – Different colors for each year make it visually overwhelming. Reduce unnecessary color variations and simplify color coding.

3.2 Makeover of the original data visualization

3.2.1 Data wrangling & data prepration

3.2.1.1 Importing the data

The code chunk below imports the Trade In Services By Services Category dataset, downloaded from Department of Statistics Singapore, DOS, using the read_excel() function from the readxl package.

Code
trade_services <- read_excel("data/Trade In Services By Services Category_base.xlsx")
3.2.1.2 Understanding the data structure
  • glimpse(): provides a transposed overview of a dataset, showing variables and their types in a concise format.
  • head(): displays the first few rows of a dataset (default is 6 rows) to give a quick preview of the data.
  • summary(): generates a statistical summary of each variable, including measures like mean, median, and range for numeric data.
  • duplicated():returns a logical vector indicating which elements or rows in a vector or data frame are duplicates.
  • colSums(is.na()): counts the number of missing values (NA) in each column of the data frame.
  • str(): use str() to display the column names, data types, and a preview of the data.
Code
glimpse(trade_services)
Rows: 51
Columns: 26
$ `Data Series` <chr> "Total Trade In Services", "Exports Of Services", "Manuf…
$ `2024`        <dbl> 997749.8, 528568.3, 471.0, 10820.1, 172971.1, 145993.3, …
$ `2023`        <dbl> 919117.0, 481009.2, 490.7, 10981.0, 149537.4, 124222.5, …
$ `2022`        <dbl> 877252.5, 468190.5, 685.5, 10117.9, 189649.0, 167209.2, …
$ `2021`        <dbl> 715093.5, 382492.4, 489.9, 8833.5, 144220.0, 130499.3, 1…
$ `2020`        <dbl> 590035.1, 300004.6, 236.8, 8172.9, 93560.0, 79857.9, 137…
$ `2019`        <dbl> 592824.3, 307215.9, 266.9, 9663.3, 94272.1, 76545.7, 177…
$ `2018`        <dbl> 561271.0, 287141.4, 370.3, 9410.0, 88888.8, 71746.4, 171…
$ `2017`        <dbl> 493353.8, 241568.0, 247.1, 7712.0, 69993.5, 54547.9, 154…
$ `2016`        <dbl> 431109.3, 211835.8, 284.8, 8418.5, 59213.4, 45873.0, 133…
$ `2015`        <dbl> 432922.3, 210622.7, 346.5, 9315.2, 64097.1, 50798.1, 132…
$ `2014`        <dbl> 406020.8, 194843.2, 424.4, 9853.1, 63918.8, 50917.2, 130…
$ `2013`        <dbl> 365055.0, 177719.3, 283.2, 10767.2, 57830.9, 45929.4, 11…
$ `2012`        <dbl> 327866.3, 161769.2, 249.6, 9053.1, 55586.3, 42864.3, 127…
$ `2011`        <dbl> 298227.8, 150013.0, 260.4, 9342.9, 53523.0, 41416.7, 121…
$ `2010`        <dbl> 273929.7, 136872.3, 289.5, 8648.4, 52606.6, 41214.6, 113…
$ `2009`        <dbl> 238962.6, 117832.0, 323.4, 9128.1, 43365.7, 33042.6, 103…
$ `2008`        <dbl> 254282.0, 126155.0, 452.1, 8354.6, 51108.8, 38561.9, 125…
$ `2007`        <dbl> 223936.4, 110796.6, 492.6, 6605.6, 43642.8, 31104.6, 125…
$ `2006`        <dbl> 195966.0, 92674.8, 534.7, 5701.0, 35877.1, 24748.0, 1112…
$ `2005`        <dbl> 167532.8, 75904.8, 315.0, 4797.7, 32435.1, 21286.1, 1114…
$ `2004`        <dbl> 150911.1, 66795.8, 353.2, 3450.5, 28630.3, 18702.3, 9928…
$ `2003`        <dbl> 122427.9, 52966.2, 303.1, 2883.1, 23343.2, 15022.2, 8321…
$ `2002`        <dbl> 109710.5, 49936.3, 369.5, 3071.8, 21539.3, 12656.8, 8882…
$ `2001`        <dbl> 102892.5, 46286.1, 195.0, 2099.5, 20497.9, 12079.2, 8418…
$ `2000`        <dbl> 96452.4, 44854.8, 202.4, 1755.3, 20379.3, 11595.5, 8783.…
Code
head(trade_services)
# A tibble: 6 × 26
  `Data Series`   `2024` `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016`
  <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Total Trade In… 9.98e5 9.19e5 8.77e5 7.15e5 5.90e5 5.93e5 5.61e5 4.93e5 4.31e5
2 Exports Of Ser… 5.29e5 4.81e5 4.68e5 3.82e5 3.00e5 3.07e5 2.87e5 2.42e5 2.12e5
3 Manufacturing … 4.71e2 4.91e2 6.85e2 4.90e2 2.37e2 2.67e2 3.70e2 2.47e2 2.85e2
4 Maintenance An… 1.08e4 1.10e4 1.01e4 8.83e3 8.17e3 9.66e3 9.41e3 7.71e3 8.42e3
5 Transport       1.73e5 1.50e5 1.90e5 1.44e5 9.36e4 9.43e4 8.89e4 7.00e4 5.92e4
6 Freight         1.46e5 1.24e5 1.67e5 1.30e5 7.99e4 7.65e4 7.17e4 5.45e4 4.59e4
# ℹ 16 more variables: `2015` <dbl>, `2014` <dbl>, `2013` <dbl>, `2012` <dbl>,
#   `2011` <dbl>, `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>,
#   `2006` <dbl>, `2005` <dbl>, `2004` <dbl>, `2003` <dbl>, `2002` <dbl>,
#   `2001` <dbl>, `2000` <dbl>
Code
summary(trade_services)
 Data Series             2024               2023               2022         
 Length:51          Min.   :    57.8   Min.   :    56.2   Min.   :    59.6  
 Class :character   1st Qu.:  1855.3   1st Qu.:  1817.6   1st Qu.:  1603.0  
 Mode  :character   Median : 13495.7   Median : 12907.1   Median : 11700.4  
                    Mean   : 71188.6   Mean   : 65608.8   Mean   : 63405.1  
                    3rd Qu.: 45456.5   3rd Qu.: 43514.1   3rd Qu.: 37732.8  
                    Max.   :997749.8   Max.   :919117.0   Max.   :877252.5  
      2021               2020               2019               2018         
 Min.   :    62.4   Min.   :    57.5   Min.   :    52.1   Min.   :    66.8  
 1st Qu.:  1350.0   1st Qu.:  1209.1   1st Qu.:  1134.8   1st Qu.:  1051.2  
 Median :  9180.3   Median :  9124.8   Median : 10250.8   Median :  9410.0  
 Mean   : 51559.2   Mean   : 42075.8   Mean   : 41792.4   Mean   : 39491.2  
 3rd Qu.: 35308.2   3rd Qu.: 30748.8   3rd Qu.: 29536.0   3rd Qu.: 27016.0  
 Max.   :715093.5   Max.   :590035.1   Max.   :592824.3   Max.   :561271.0  
      2017               2016               2015               2014         
 Min.   :    56.1   Min.   :    72.0   Min.   :    46.9   Min.   :    56.9  
 1st Qu.:   957.6   1st Qu.:   811.6   1st Qu.:   809.8   1st Qu.:   739.1  
 Median :  7848.2   Median :  6458.8   Median :  6194.0   Median :  6021.0  
 Mean   : 34523.8   Mean   : 29987.0   Mean   : 30256.4   Mean   : 28284.6  
 3rd Qu.: 23594.4   3rd Qu.: 22798.2   3rd Qu.: 23150.7   3rd Qu.: 22757.6  
 Max.   :493353.8   Max.   :431109.3   Max.   :432922.3   Max.   :406020.8  
      2013               2012               2011               2010         
 Min.   :    78.8   Min.   :    84.6   Min.   :    62.5   Min.   :    64.0  
 1st Qu.:   661.6   1st Qu.:   656.4   1st Qu.:   617.9   1st Qu.:   587.1  
 Median :  4647.2   Median :  4034.2   Median :  3397.2   Median :  3493.8  
 Mean   : 25332.7   Mean   : 22691.6   Mean   : 20633.9   Mean   : 19039.0  
 3rd Qu.: 21875.2   3rd Qu.: 19465.0   3rd Qu.: 17328.2   3rd Qu.: 16731.5  
 Max.   :365055.0   Max.   :327866.3   Max.   :298227.8   Max.   :273929.7  
      2009               2008               2007               2006         
 Min.   :    51.0   Min.   :    76.9   Min.   :   130.4   Min.   :    48.6  
 1st Qu.:   579.8   1st Qu.:   536.3   1st Qu.:   441.2   1st Qu.:   413.8  
 Median :  3015.6   Median :  3238.9   Median :  2213.8   Median :  2080.4  
 Mean   : 16671.3   Mean   : 17870.3   Mean   : 15789.5   Mean   : 13848.0  
 3rd Qu.: 13932.2   3rd Qu.: 14221.4   3rd Qu.: 13344.4   3rd Qu.: 11489.8  
 Max.   :238962.6   Max.   :254282.0   Max.   :223936.4   Max.   :195966.0  
      2005               2004               2003               2002         
 Min.   :    24.7   Min.   :    17.3   Min.   :    17.2   Min.   :    14.3  
 1st Qu.:   313.1   1st Qu.:   284.9   1st Qu.:   228.2   1st Qu.:   212.3  
 Median :  1524.5   Median :  1374.8   Median :  1108.7   Median :  1001.9  
 Mean   : 11792.0   Mean   : 10647.8   Mean   :  8587.8   Mean   :  7648.4  
 3rd Qu.: 10741.8   3rd Qu.:  9462.4   3rd Qu.:  7513.7   3rd Qu.:  8339.2  
 Max.   :167532.8   Max.   :150911.1   Max.   :122427.9   Max.   :109710.5  
      2001               2000        
 Min.   :    22.6   Min.   :   13.6  
 1st Qu.:   188.7   1st Qu.:  170.6  
 Median :   778.0   Median :  794.0  
 Mean   :  7219.0   Mean   : 6794.9  
 3rd Qu.:  8164.2   3rd Qu.: 7509.9  
 Max.   :102892.5   Max.   :96452.4  
Code
trade_services[duplicated(trade_services),]
# A tibble: 0 × 26
# ℹ 26 variables: Data Series <chr>, 2024 <dbl>, 2023 <dbl>, 2022 <dbl>,
#   2021 <dbl>, 2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <dbl>,
#   2015 <dbl>, 2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>,
#   2009 <dbl>, 2008 <dbl>, 2007 <dbl>, 2006 <dbl>, 2005 <dbl>, 2004 <dbl>,
#   2003 <dbl>, 2002 <dbl>, 2001 <dbl>, 2000 <dbl>
Note
  • Ensure that there are no duplicated columns, if not will have to investigate further.
Code
colSums(is.na(trade_services))
Data Series        2024        2023        2022        2021        2020 
          0           0           0           0           0           0 
       2019        2018        2017        2016        2015        2014 
          0           0           0           0           0           0 
       2013        2012        2011        2010        2009        2008 
          0           0           0           0           0           0 
       2007        2006        2005        2004        2003        2002 
          0           0           0           0           0           0 
       2001        2000 
          0           0 
Note
  • Ensure that there are no NA values, if not will have to investigate further.
  • Possibility to use drop_na() function to drop rows where any specified column contains a missing value.
Code
str(trade_services)
tibble [51 × 26] (S3: tbl_df/tbl/data.frame)
 $ Data Series: chr [1:51] "Total Trade In Services" "Exports Of Services" "Manufacturing Services On Physical Inputs Owned By Others" "Maintenance And Repair Services" ...
 $ 2024       : num [1:51] 997750 528568 471 10820 172971 ...
 $ 2023       : num [1:51] 919117 481009 491 10981 149537 ...
 $ 2022       : num [1:51] 877253 468191 686 10118 189649 ...
 $ 2021       : num [1:51] 715094 382492 490 8834 144220 ...
 $ 2020       : num [1:51] 590035 300005 237 8173 93560 ...
 $ 2019       : num [1:51] 592824 307216 267 9663 94272 ...
 $ 2018       : num [1:51] 561271 287141 370 9410 88889 ...
 $ 2017       : num [1:51] 493354 241568 247 7712 69994 ...
 $ 2016       : num [1:51] 431109 211836 285 8418 59213 ...
 $ 2015       : num [1:51] 432922 210623 346 9315 64097 ...
 $ 2014       : num [1:51] 406021 194843 424 9853 63919 ...
 $ 2013       : num [1:51] 365055 177719 283 10767 57831 ...
 $ 2012       : num [1:51] 327866 161769 250 9053 55586 ...
 $ 2011       : num [1:51] 298228 150013 260 9343 53523 ...
 $ 2010       : num [1:51] 273930 136872 290 8648 52607 ...
 $ 2009       : num [1:51] 238963 117832 323 9128 43366 ...
 $ 2008       : num [1:51] 254282 126155 452 8355 51109 ...
 $ 2007       : num [1:51] 223936 110797 493 6606 43643 ...
 $ 2006       : num [1:51] 195966 92675 535 5701 35877 ...
 $ 2005       : num [1:51] 167533 75905 315 4798 32435 ...
 $ 2004       : num [1:51] 150911 66796 353 3450 28630 ...
 $ 2003       : num [1:51] 122428 52966 303 2883 23343 ...
 $ 2002       : num [1:51] 109711 49936 370 3072 21539 ...
 $ 2001       : num [1:51] 102893 46286 195 2100 20498 ...
 $ 2000       : num [1:51] 96452 44855 202 1755 20379 ...
Note
  • Ensure that all variables are correctly classified by data type; recast variable types if needed.
  • Variables are correctly classified - where categorical variables are classified as character, while continuous variables are classified as double.

The trade_services tibble contains 26 attributes, as shown above.

The following preprocessing checks were conducted as part of data preparation:

Preprocessing Checks
  • Verified that the correct data types were loaded in the trade_services dataset using glimpse() and str()
  • Ensured there were no duplicate variable names using duplicated() in the dataset
  • Checked for missing values using colSums(is.na())
Note
  • trade_services dataset will be used for both DataVis makeover 1 & 2
3.2.1.3 Data preparation and filtering for relevant variables

After importing the trade_services dataset, we will filter for the three rows - Exports of Services, Imports of Services, and Total Trade in Services, which are essential for recreating the original visualization.

Code
# Select rows where 'Data Series' is either "Exports Of Services" or "Imports Of Services"
export_vs_import <- trade_services %>%
  filter(`Data Series` %in% c("Exports Of Services", "Imports Of Services", "Total Trade In Services"))

# View the filtered data
print(export_vs_import)
# A tibble: 3 × 26
  `Data Series`   `2024` `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016`
  <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Total Trade In… 9.98e5 9.19e5 8.77e5 7.15e5 5.90e5 5.93e5 5.61e5 4.93e5 4.31e5
2 Exports Of Ser… 5.29e5 4.81e5 4.68e5 3.82e5 3.00e5 3.07e5 2.87e5 2.42e5 2.12e5
3 Imports Of Ser… 4.69e5 4.38e5 4.09e5 3.33e5 2.90e5 2.86e5 2.74e5 2.52e5 2.19e5
# ℹ 16 more variables: `2015` <dbl>, `2014` <dbl>, `2013` <dbl>, `2012` <dbl>,
#   `2011` <dbl>, `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>,
#   `2006` <dbl>, `2005` <dbl>, `2004` <dbl>, `2003` <dbl>, `2002` <dbl>,
#   `2001` <dbl>, `2000` <dbl>

3.2.2 Revised visualization - Combination plot

Key makeover changes:

1️⃣ Overall design improvements:

  • Replaced bar chart with a line chart for exports and imports –> Helps in tracking trends more effectively over time instead of color-coded bars per year.

  • Introduced a dashed line for total trade values –> Previously, total trade was only displayed as floating labels above bars, which could be overlooked.

  • Changed color usage –> Original chart had different colors for each year, making trend analysis harder. Now, consistent colors are used:

    • Exports are in red,

    • Imports are in blue,

    • Total Trade is in black (dashed),

    • Trade Surplus is in green, and

    • Trade Deficit is in red.

2️⃣ Trade Balance Integration:

  • Integration of key insights –> Previously trade balance was displayed separately as a small section at the bottom. Now, I have integrated it as a bar chart within the main graph

    • green bars represent trade surplus, and

    • red bars will represent trade deficit.

Overall
  • The new chart is cleaner, and provides a better narrative of how exports, imports, total trade, and trade balance evolve over time.
  • The new chart simplifies year-on-year trend analysis, making it easier to identify patterns in Singapore’s international trade.

Code
# Filter relevant rows (Exports, Imports, and Total Trade)
export_vs_import <- trade_services %>%
  filter(`Data Series` %in% c("Exports Of Services", "Imports Of Services", "Total Trade In Services"))

# Convert dataset from wide to long format for ggplot
data_long <- export_vs_import %>%
  pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
  mutate(Year = as.numeric(Year)) %>%
  filter(Year >= 2020 & Year <= 2024)

# Extract `Total Trade In Services` from dataset
total_trade_data <- data_long %>%
  filter(`Data Series` == "Total Trade In Services") %>%
  select(Year, Value) %>%
  rename(Total_Trade = Value)

# Calculate Trade Balance (Exports - Imports)
trade_balance <- data_long %>%
  spread(`Data Series`, Value) %>%
  mutate(Services_Trade_Balance = `Exports Of Services` - `Imports Of Services`) %>%
  select(Year, Services_Trade_Balance)

# Merge Trade Balance & Total Trade into DataFrame
data_long <- left_join(data_long, trade_balance, by = "Year")
data_long <- left_join(data_long, total_trade_data, by = "Year")

# Increase plotting window size
options(repr.plot.width=16, repr.plot.height=9)  

# Create the plot
p <- ggplot(data_long) +

  # **Line Chart for Exports**
  geom_line(data = subset(data_long, `Data Series` == "Exports Of Services"), 
            aes(x = Year, y = Value, color = "Exports"), linewidth = 1.5) +
  
  # **Data Labels for Exports**
  geom_text(data = subset(data_long, `Data Series` == "Exports Of Services"), 
            aes(x = Year, y = Value, label = round(Value, 1)), 
            vjust = -1, size = 2.5, color = "red") +

  # **Line Chart for Imports**
  geom_line(data = subset(data_long, `Data Series` == "Imports Of Services"), 
            aes(x = Year, y = Value, color = "Imports"), linewidth = 1.5) +

  # **Data Labels for Imports**
  geom_text(data = subset(data_long, `Data Series` == "Imports Of Services"), 
            aes(x = Year, y = Value, label = round(Value, 1)), 
            vjust = 1.5, size = 2.5, color = "blue") +

  # **Line Chart for Total Trade (Black, Dashed)**
  geom_line(data = total_trade_data,
            aes(x = Year, y = Total_Trade, color = "Total Trade"), linewidth = 1.5, linetype = "dashed") +

  # **Labels for Total Trade**
  geom_text(data = total_trade_data,
            aes(x = Year, y = Total_Trade + 20000, label = round(Total_Trade, 1)), 
            vjust = -0.5, size = 2.5, color = "black") +

  # **Bar Chart for Trade Balance (Surplus = Green, Deficit = Red)**
  geom_bar(data = trade_balance, 
           aes(x = Year, y = Services_Trade_Balance, 
               fill = ifelse(Services_Trade_Balance >= 0, "Surplus", "Deficit")), 
           stat = "identity", width = 0.5) +

  # **Move Labels to the Top of Trade Balance Bars**
  geom_text(data = trade_balance, 
            aes(x = Year, y = Services_Trade_Balance + 5000, 
                label = round(Services_Trade_Balance, 1)), 
            vjust = -0.5, size = 2.5, color = "black") +

  # **Title & Labels**
  labs(title = "Trends in International Trade in Services (2020-2024)",
       subtitle = "Exports, Imports & Total Trade as Lines, Trade Balance as Bars",
       x = "Year", y = "S$ Billion") +

  # **Custom Theme**
  theme_minimal(base_size = 10) +  

  # **Color Customization**
  scale_color_manual(name = "Category", values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
  scale_fill_manual(name = "Trade Balance", values = c("Surplus" = "green", "Deficit" = "red")) +

  # **Formatting**
  theme(
        legend.position = "bottom",
        panel.grid.major.y = element_line(color = "gray", linetype = "dashed"),
        axis.text.x = element_text(face = "bold"))

# Display the plot
print(p)

3.3 Time series analysis

I conducted time-series analysis to analyze trends in exports, imports, and total trade from 2000 to 2024. First, I visualized the historical data using ggplot2, ensuring clear differentiation of trends by assigning red for exports, blue for imports, and black for total trade. This provided insights into the overall growth patterns and fluctuations over time.

Key observations:

  • Steady growth: Singapore’s total trade in services has shown a strong upward trend, reflecting its role as a global trade hub.
  • Exports Lead Imports: While both have grown, exports consistently exceed imports, indicating a positive trade balance.
  • Impact of Global Events: Periods of slower growth (e.g., 2008-2012) align with economic disruptions like the Global Financial Crisis.
  • Accelerated Growth After 2018: Likely driven by Singapore’s push for digital services, financial innovation, and trade agreements.
Code
# Load necessary libraries
library(ggplot2)
library(tidyverse)
library(forecast)
library(plotly)

# Convert data from wide to long format
export_vs_import_long <- export_vs_import %>%
  pivot_longer(cols = -`Data Series`, names_to = "Year", values_to = "Value") %>%
  mutate(Year = as.numeric(Year)) %>%
  arrange(Year)  # Ensure years are sorted in ascending order

# Create separate data frames for each category to ensure proper order
exports_data <- export_vs_import_long %>% filter(`Data Series` == "Exports Of Services")
imports_data <- export_vs_import_long %>% filter(`Data Series` == "Imports Of Services")
total_trade_data <- export_vs_import_long %>% filter(`Data Series` == "Total Trade In Services")

# Convert to time-series format (with correctly sorted values)
exports_ts <- ts(exports_data$Value, start = exports_data$Year[1], frequency = 1)
imports_ts <- ts(imports_data$Value, start = imports_data$Year[1], frequency = 1)
total_trade_ts <- ts(total_trade_data$Value, start = total_trade_data$Year[1], frequency = 1)

# Create Data Frame for Plotting
df <- data.frame(
  Year = rep(exports_data$Year, 3),
  Value = c(exports_data$Value, imports_data$Value, total_trade_data$Value),
  Category = rep(c("Exports", "Imports", "Total Trade"), each = nrow(exports_data))
)

# Create a ggplot with explicit group aesthetic
p <- ggplot(df, aes(x = Year, y = Value, color = Category, group = Category, 
                     text = paste0("Year: ", Year, "<br>Value: ", round(Value, 1)))) +
  geom_line(linewidth = 1.2) +  # Use linewidth instead of size (ggplot2 3.4.0+)
  scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
  ggtitle("Export vs Import vs Total Trade Trends") +
  ylab("Value (in Millions)") + xlab("Year") +
  theme_minimal()

# Convert to interactive plot with tooltips
interactive_plot <- ggplotly(p, tooltip = "text")

# Display interactive plot
interactive_plot

3.4 Time series forecasting

3.4.1 ETS vs ARIMA model comparison

Based on the model accuracy metrics, ARIMA outperforms ETS in forecasting both exports and imports. ARIMA consistently shows lower RMSE, MAE, and MAPE values, indicating that its predictions are more precise and closer to actual observations. Thus, ARIMA will be the preferred model for time series forecasting.

                    ME     RMSE       MAE        MPE      MAPE      MASE
Training set  2250.362 11275.74  8425.338 -0.1740338  7.973583 0.5737539
Test set     66791.835 86554.64 77415.700 13.2622158 16.803450 5.2719028
                   ACF1
Training set 0.05311589
Test set             NA
                    ME     RMSE       MAE       MPE      MAPE      MASE
Training set  3516.706 10885.93  7472.406  2.582523  5.207104 0.5088606
Test set     54848.687 74121.02 67064.995 10.655012 14.727052 4.5670340
                  ACF1
Training set -0.109228
Test set            NA
# A tibble: 4 × 5
  Model Type      RMSE    MAE  MAPE
  <chr> <chr>    <dbl>  <dbl> <dbl>
1 ETS   Exports 86555. 77416.  16.8
2 ETS   Imports 89216. 74235.  17.4
3 ARIMA Exports 74121. 67065.  14.7
4 ARIMA Imports 81957. 68397.  16.0
Code
library(forecast)
library(tidyverse)
library(ggplot2)

# Set forecast horizon (e.g., last 5 years)
horizon <- 5

# Ensure there are enough observations for training and test
if (length(exports_ts) <= horizon | length(imports_ts) <= horizon) {
  stop("Not enough data points for proper train-test split!")
}

# Split data: Use last `horizon` years as the test set
train_exports <- head(exports_ts, length(exports_ts) - horizon)
test_exports <- tail(exports_ts, horizon)

train_imports <- head(imports_ts, length(imports_ts) - horizon)
test_imports <- tail(imports_ts, horizon)

# Fit ETS models
ets_exports <- ets(train_exports)
ets_imports <- ets(train_imports)

# Fit ARIMA models
arima_exports <- auto.arima(train_exports)
arima_imports <- auto.arima(train_imports)

# 🔹 Generate Forecasts BEFORE extracting forecast values
ets_forecast_exports <- forecast(ets_exports, h = horizon)
ets_forecast_imports <- forecast(ets_imports, h = horizon)

arima_forecast_exports <- forecast(arima_exports, h = horizon)
arima_forecast_imports <- forecast(arima_imports, h = horizon)

# 🔹 Extract the forecasted mean values
ets_forecast_exports_values <- as.numeric(ets_forecast_exports$mean)
ets_forecast_imports_values <- as.numeric(ets_forecast_imports$mean)

arima_forecast_exports_values <- as.numeric(arima_forecast_exports$mean)
arima_forecast_imports_values <- as.numeric(arima_forecast_imports$mean)

# Ensure test sets are numeric
test_exports <- as.numeric(test_exports)
test_imports <- as.numeric(test_imports)

# Ensure test and forecast lengths match
if (length(test_exports) != length(ets_forecast_exports_values)) {
  test_exports <- head(test_exports, length(ets_forecast_exports_values))
}

if (length(test_imports) != length(ets_forecast_imports_values)) {
  test_imports <- head(test_imports, length(ets_forecast_imports_values))
}

# 🔹 Use accuracy() correctly with the model object
ets_accuracy_exports <- forecast::accuracy(ets_forecast_exports, test_exports)
ets_accuracy_imports <- forecast::accuracy(ets_forecast_imports, test_imports)

arima_accuracy_exports <- forecast::accuracy(arima_forecast_exports, test_exports)
arima_accuracy_imports <- forecast::accuracy(arima_forecast_imports, test_imports)

print(ets_accuracy_exports)
print(arima_accuracy_exports)

# 🔹 Combine accuracy results into a structured dataframe
accuracy_df <- tibble(
  Model = rep(c("ETS", "ARIMA"), each = 2),
  Type = rep(c("Exports", "Imports"), times = 2),
  RMSE = c(ets_accuracy_exports["Test set", "RMSE"], 
           ets_accuracy_imports["Test set", "RMSE"],
           arima_accuracy_exports["Test set", "RMSE"], 
           arima_accuracy_imports["Test set", "RMSE"]),
  MAE = c(ets_accuracy_exports["Test set", "MAE"], 
          ets_accuracy_imports["Test set", "MAE"],
          arima_accuracy_exports["Test set", "MAE"], 
          arima_accuracy_imports["Test set", "MAE"]),
  MAPE = c(ets_accuracy_exports["Test set", "MAPE"], 
           ets_accuracy_imports["Test set", "MAPE"],
           arima_accuracy_exports["Test set", "MAPE"], 
           arima_accuracy_imports["Test set", "MAPE"])
)

# Print final accuracy dataframe
print(accuracy_df)

3.4.2 ARIMA model

Next, I applied ARIMA modeling (auto.arima()) using the forecast package to predict trade values for the next five years (2025-2029).

Code
library(forecast)

# Apply ARIMA Model
arima_exports <- auto.arima(exports_ts)
arima_imports <- auto.arima(imports_ts)
arima_total_trade <- auto.arima(total_trade_ts)

# Forecast for the next 5 years
arima_forecast_exports <- forecast(arima_exports, h = 5)
arima_forecast_imports <- forecast(arima_imports, h = 5)
arima_forecast_total_trade <- forecast(arima_total_trade, h = 5)

# Plot ARIMA forecasts
autoplot(arima_forecast_exports) + ggtitle("ARIMA Forecast for Exports")
autoplot(arima_forecast_imports) + ggtitle("ARIMA Forecast for Imports")
autoplot(arima_forecast_total_trade) + ggtitle("ARIMA Forecast for Total Trade")

# Convert forecasts to data frames for ggplot
df_exports <- as.data.frame(arima_forecast_exports) %>% mutate(Year = seq(max(exports_data$Year) + 1, by = 1, length.out = 5), Category = "Exports")
df_imports <- as.data.frame(arima_forecast_imports) %>% mutate(Year = seq(max(imports_data$Year) + 1, by = 1, length.out = 5), Category = "Imports")
df_total_trade <- as.data.frame(arima_forecast_total_trade) %>% mutate(Year = seq(max(total_trade_data$Year) + 1, by = 1, length.out = 5), Category = "Total Trade")

# Combine all forecasts
df_forecast <- bind_rows(df_exports, df_imports, df_total_trade)

# Create the forecast plot
ggplot(df_forecast, aes(x = Year, y = `Point Forecast`, color = Category)) +
  geom_line(linewidth = 1.5) +  # Use linewidth instead of size (ggplot2 3.4.0+)
  scale_color_manual(values = c("Exports" = "red", "Imports" = "blue", "Total Trade" = "black")) +
  ggtitle("ARIMA Forecast for Exports, Imports, and Total Trade") +
  ylab("Forecasted Value (in Millions)") + xlab("Year") +
  theme_minimal()

3.4.2 Forecast accuracy

We will also be testing the accuracy of the forecast model to evaluate its performance and reliability.

Standard market practices for evaluating forecast accuracy rely on key metrics such as MAPE, RMSE, MAE, and ACF1.

  • Based on these measures, the imports forecast is highly accurate (MAPE ~4.84%), while exports and total trade forecasts fall within the “good” range (MAPE ~6.10% and ~5.23%).

  • The absence of significant autocorrelation in residuals suggests the models effectively capture trends.

  • The high RMSE values, particularly for total trade, indicate potential reliability issues due to large fluctuations in data.

### Forecast Accuracy for Exports ###
                      ME      RMSE       MAE        MPE      MAPE       MASE
Training set    5842.817  21775.54  13733.86   2.777663  6.099909  0.6402954
Test set     -215550.380 217655.84 215550.38 -53.417606 53.417606 10.0493182
                    ACF1
Training set -0.09397345
Test set              NA

### Forecast Accuracy for Imports ###
                      ME      RMSE       MAE        MPE      MAPE       MASE
Training set    3887.432  15372.93  10045.78   1.659329  4.840996  0.5509206
Test set     -182952.628 184188.44 182952.63 -49.695067 49.695067 10.0333023
                   ACF1
Training set 0.03707208
Test set             NA

### Forecast Accuracy for Total Trade ###
                      ME      RMSE       MAE        MPE      MAPE       MASE
Training set    9537.844  35750.43  23074.03   2.170273  5.229894  0.5884102
Test set     -399957.217 403059.64 399957.22 -51.756773 51.756773 10.1992978
                    ACF1
Training set -0.02527777
Test set              NA
Code
# Compute and print accuracy
cat("### Forecast Accuracy for Exports ###\n")
print(forecast::accuracy(arima_forecast_exports, test_exports))

# Print forecast accuracy for imports
cat("\n### Forecast Accuracy for Imports ###\n")
print(forecast::accuracy(arima_forecast_imports, test_imports))

# Print forecast accuracy for total trade
cat("\n### Forecast Accuracy for Total Trade ###\n")
print(forecast::accuracy(arima_forecast_total_trade, test_total_trade))

4. DataVis makeover 2

4.1 Makeover of - Exports/ Imports of Services by Services Category

4.1.1 Original visualization

The original visualizations analyzed in this analysis are sourced from here.

4.1.2 Evaluation of the original visualization

In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) clarity, and (2) visual appeal:

  • Clarity: How well the data is presented and understood
Pros Cons Suggested fixes
Clearly labeled service categories with values and percentages – Each category is distinctly labeled, making it easy to understand the contributions of each service type. Pie charts are not effective for comparing data slices – When multiple categories have close values, it is difficult to differentiate them accurately. Replace pie charts with bar charts or grouped bar charts - for clearer comparison of service categories.
Percentage values enhance clarity – Helps users quickly understand each category’s contribution to total exports/ imports. Pie charts are only useful when there are limited categories – Too many slices make it difficult to interpret, as smaller sections become unreadable. Replace with a bar chart or grouped bar chart to improve readability.
The pie chart format makes trend comparison difficult – Pie charts only represent a single year and do not show how exports have changed over time. Add a trend chart (e.g., line graph or bar chart) – Including historical data can help identify patterns and trends over multiple years.
  • Visual appeal: How visually engaging and effective the design is
Pros Cons Suggested fixes
Good use of icons to represent different services – This makes the chart more engaging and helps with intuitive understanding. Pie charts distort perception – it’s hard to accurately compare slice sizes, especially when they are similar. Use a bar chart, grouped bar chart or treemaps, which allows for more proportional representation of each service category.
Well-structured layout with categories spread around the chart - The service categories for both exports and imports maintain a similar placement Pie chart format is too cluttered – The large number of slices makes it difficult to read smaller segments and their labels. Use a bar chart, grouped bar chart or treemaps for better spacing and spatial representation.
Too many elements are packed into the chart, making it overwhelming. Simplify by removing non-essential elements - i.e: icons
No legend for color segmentation – The chart uses various colors for different service categories, but there is no clear legend explaining their grouping or significance. Include a legend or categorize colors meaningfully – Assign gradient based color scale to visually emphasize trade volume intensity.
Limitations of pie charts in data visualization
  • Distorted perception: Pie chart makes comparisons difficult due to the reliance on angles and areas rather than a common baseline.
  • Difficult to compare similar data slices: Can be misleading when there are many segments/ similar-sized portions making it hard to interpret differences accurately.
  • Space constraints: Pie chart can take up more space than necessary and can clutter dashboards or reports.
  • Poor for trend analysis: Pie charts only show a single point in time and do not help in comparing trends over multiple years.
Why are the use of pie charts be frown upon in data visualization?
  • Refer to this page to find out why the use of pie charts are discouraged.

4.2 Makeover of the original data visualization

4.2.1 Data preparation and filtering for relevant variables

After importing the trade_services data set, we will filter for the key categories that contribute to the exports/ imports of services in Singapore, excluding subcategories. The 12 major categories are identified based on indentation, as shown below:

Data Series Shortened label
Manufacturing Services On Physical Inputs Owned By Others Manuf. Services
Maintenance And Repair Services Maintenance & Repair
Transport Transport
Travel Travel
Insurance Insurance
Government Goods And Services Govt. Services
Construction Construction
Financial Financial Services
Telecommunications, Computer & Information Telecom & IT
Charges For The Use Of Intellectual Property Intellectual Property
Personal, Cultural And Recreational Cultural & Recreational
Other Business Services Other Biz Services

We first clean the “Data Series” column by removing leading and trailing spaces to ensure consistency. Next, we identify the positions of “Exports Of Services” and “Imports Of Services” in the dataset, extracting only the rows between these markers for exports and those following the imports marker for imports. We then filter both datasets to retain only the 12 major service categories as hghlighted above.

Finally, we combine the cleaned exports and imports data into a single structured table, adding a “Trade Type” column to differentiate between imports/ exports.

Code
# Trim leading/trailing spaces from 'Data Series' column
trade_services$`Data Series` <- trimws(trade_services$`Data Series`)

# Identify row indices for "Exports Of Services" and "Imports Of Services"
export_start_idx <- which(trade_services$`Data Series` == "Exports Of Services")
import_start_idx <- which(trade_services$`Data Series` == "Imports Of Services")

# Extract rows between "Exports Of Services" and "Imports Of Services"
exports_df <- trade_services[(export_start_idx + 1):(import_start_idx - 1), ]
imports_df <- trade_services[(import_start_idx + 1):nrow(trade_services), ]

# Define the 12 major categories
major_categories <- c(
  "Manufacturing Services On Physical Inputs Owned By Others",
  "Maintenance And Repair Services",
  "Transport",
  "Travel",
  "Insurance",
  "Government Goods And Services",
  "Construction",
  "Financial",
  "Telecommunications, Computer & Information",
  "Charges For The Use Of Intellectual Property",
  "Personal, Cultural And Recreational",
  "Other Business Services"
)

# Filter only the major categories for exports and imports
exports_major <- exports_df %>% filter(`Data Series` %in% major_categories)
imports_major <- imports_df %>% filter(`Data Series` %in% major_categories)

# Add a column to indicate whether it's exports or imports
exports_major$`Trade Type` <- "Exports"
imports_major$`Trade Type` <- "Imports"

# Combine into a single dataframe
final_trade_df <- bind_rows(exports_major, imports_major)

# Reorder columns: Move "Trade Type" to the second column
final_trade_df <- final_trade_df %>% select(`Data Series`, `Trade Type`, everything())

# View the final structured data
print(final_trade_df)
# A tibble: 24 × 27
   `Data Series`   `Trade Type` `2024` `2023` `2022` `2021` `2020` `2019` `2018`
   <chr>           <chr>         <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 Manufacturing … Exports      4.71e2 4.91e2 6.85e2 4.90e2   237.   267.   370.
 2 Maintenance An… Exports      1.08e4 1.10e4 1.01e4 8.83e3  8173.  9663.  9410 
 3 Transport       Exports      1.73e5 1.50e5 1.90e5 1.44e5 93560  94272. 88889.
 4 Travel          Exports      3.19e4 2.79e4 1.57e4 5.40e3  7527. 27755. 27367.
 5 Insurance       Exports      1.26e4 1.11e4 1.15e4 9.13e3  8438   8817.  9485.
 6 Government Goo… Exports      4.77e2 4.64e2 4.44e2 4.22e2   412.   419.   414.
 7 Construction    Exports      2.10e3 2.04e3 1.62e3 1.20e3  1272.  1742.  1642.
 8 Financial       Exports      7.16e4 6.53e4 5.61e4 5.31e4 49688. 45944. 42840.
 9 Telecommunicat… Exports      4.11e4 3.93e4 3.42e4 3.19e4 26826. 20252. 20533.
10 Charges For Th… Exports      2.63e4 2.42e4 1.79e4 1.64e4 12866. 12297. 12084.
# ℹ 14 more rows
# ℹ 18 more variables: `2017` <dbl>, `2016` <dbl>, `2015` <dbl>, `2014` <dbl>,
#   `2013` <dbl>, `2012` <dbl>, `2011` <dbl>, `2010` <dbl>, `2009` <dbl>,
#   `2008` <dbl>, `2007` <dbl>, `2006` <dbl>, `2005` <dbl>, `2004` <dbl>,
#   `2003` <dbl>, `2002` <dbl>, `2001` <dbl>, `2000` <dbl>

4.2.2 Revised visualization - bar chart

Key makeover changes:

1️⃣ Single chart for imports and exports:

  • Combined visualization –> Instead of two separate pie charts, the new bar chart combines both, allowing for side-by-side comparison.

2️⃣ Improved readability and comparability:

  • Grouped bars (red for exports, blue for imports) –> Offer a clearer visual distinction between trade types.

  • Sorted categories (largest to smallest) –> Focus attention on key contributors, ensuring the most impactful sectors are easily identified.

3️⃣ Simplified layout:

  • Removed unnecessary graphics (i.e.: icons, maps) –>Decluttered the chart to emphasize trade data information.

4️⃣ Interactive insights:

  • Hover tooltips –> Display category names, trade values, and percentages shares, making the chart more dynamic and user-friendly.
Overall
  • Better clarity, accuracy, and direct comparison in a single view.
  • Eliminates redundancy and improves clarity for faster data interpretation.
Code
# Load required libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(plotly)  # For interactive tooltips

# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
  select(`Data Series`, `Trade Type`, `2024`) %>%
  rename(Value = `2024`)

# Shorten long category labels for better readability
trade_2024$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Financial", "Financial Services", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", trade_2024$`Data Series`)
trade_2024$`Data Series` <- gsub("Other Business Services", "Other Biz Services", trade_2024$`Data Series`)

# Compute Total Exports & Imports for Percentage Calculation
total_exports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Exports"], na.rm = TRUE)
total_imports <- sum(trade_2024$Value[trade_2024$`Trade Type` == "Imports"], na.rm = TRUE)

# Add Percentage Column
trade_2024 <- trade_2024 %>%
  mutate(Percentage = ifelse(`Trade Type` == "Exports", 
                             Value / total_exports * 100, 
                             Value / total_imports * 100))

# Sum values for sorting (descending order)
category_order <- trade_2024 %>%
  group_by(`Data Series`) %>%
  summarise(Total_Trade = sum(Value, na.rm = TRUE)) %>%
  arrange(desc(Total_Trade)) %>%
  pull(`Data Series`)

# Convert `Data Series` to factor for correct sorting
trade_2024$`Data Series` <- factor(trade_2024$`Data Series`, levels = category_order)

# Define custom colors for Exports (Red) and Imports (Blue)
custom_colors <- c("Exports" = "red", "Imports" = "blue")

# Create the ggplot object
gg <- ggplot(trade_2024, aes(x = `Data Series`, y = Value, fill = `Trade Type`,
                             text = paste("Category:", `Data Series`, 
                                          "<br>Trade Type:", `Trade Type`, 
                                          "<br>Value: S$", scales::comma(Value),
                                          "<br>Share:", round(Percentage, 1), "%"))) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +  # Side-by-side bars
  scale_fill_manual(values = custom_colors) +  # Apply custom colors
  scale_y_continuous(labels = scales::comma) +  # Format y-axis labels
  labs(title = "Singapore Trade in Services - 2024",
       subtitle = "Side-by-Side Comparison of Imports and Exports for 12 Service Categories",
       x = "Service Category",
       y = "Trade Value",
       fill = "Trade Type") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate x-axis labels
        legend.position = "bottom")  

# Convert to interactive plot with tooltips
ggplotly(gg, tooltip = "text")

4.2.3 Revised visualization - Treemap

Key makeover changes:

1️⃣ Transition from Pie charts to Treemaps

  • Pie charts are replaced with treemaps –> Treemaps uses area size to provide a better spatial comparison instead of circular slices in pie charts

2️⃣ Added plotly inteactive features:

  • Enabled interactive treempas using plotly –> Allows for hovering over categories to explore trade value more dynamically.

3️⃣ Gradient-based color scale for trade value:

  • Applied green gradient (darker green = higher value, lighter green = lower value) –> Visually emphasizes trade volume intensity instead of random colors in pie charts.
Code
# Load required libraries
library(dplyr)
library(tidyr)
library(treemap)
library(plotly)

# Filter data for the year 2024
trade_2024 <- final_trade_df %>%
  select(`Data Series`, `Trade Type`, `2024`) %>%
  rename(Value = `2024`)

# Shorten long category labels for better readability
trade_2024 <- trade_2024 %>%
  mutate(Short_Label = case_when(
    `Data Series` == "Manufacturing Services On Physical Inputs Owned By Others" ~ "Manuf. Services",
    `Data Series` == "Maintenance And Repair Services" ~ "Maintenance & Repair",
    `Data Series` == "Government Goods And Services" ~ "Govt. Services",
    `Data Series` == "Financial" ~ "Financial Services",
    `Data Series` == "Telecommunications, Computer & Information" ~ "Telecom & IT",
    `Data Series` == "Charges For The Use Of Intellectual Property" ~ "Intellectual Property",
    `Data Series` == "Personal, Cultural And Recreational" ~ "Cultural & Recreational",
    `Data Series` == "Other Business Services" ~ "Other Biz Services",
    TRUE ~ `Data Series`
  ))

# Split the dataset into Exports and Imports
trade_exports <- trade_2024 %>% filter(`Trade Type` == "Exports")
trade_imports <- trade_2024 %>% filter(`Trade Type` == "Imports")

# Create Treemap Data for Exports
treemap_exports <- treemap(trade_exports,
                           index = "Short_Label",
                           vSize = "Value",
                           type = "index",
                           palette = "Blues",
                           title = "Exports Treemap - 2024",
                           draw = FALSE)

# Create Treemap Data for Imports
treemap_imports <- treemap(trade_imports,
                           index = "Short_Label",
                           vSize = "Value",
                           type = "index",
                           palette = "Oranges",
                           title = "Imports Treemap - 2024",
                           draw = FALSE)

# Convert Exports Treemap Data to Plotly (Darker Green for Higher Values)
p1_interactive <- plot_ly(
  data = treemap_exports$tm,
  labels = ~Short_Label,
  parents = "",
  values = ~vSize,
  text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
  type = "treemap",
  textinfo = "label+text",
  marker = list(
    colorscale = list(c(0, 1), c("#d9f2d9", "#006400")),  # Light to Dark Green
    cmin = min(treemap_exports$tm$vSize),
    cmax = max(treemap_exports$tm$vSize),
    colorbar = list(title = "Trade Value")
  )
) %>%
  layout(title = "Exports Treemap - 2024")

# Convert Imports Treemap Data to Plotly (Darker Green for Higher Values)
p2_interactive <- plot_ly(
  data = treemap_imports$tm,
  labels = ~Short_Label,
  parents = "",
  values = ~vSize,
  text = ~paste("Category:", Short_Label, "<br>Value:", vSize),
  type = "treemap",
  textinfo = "label+text",
  marker = list(
    colorscale = list(c(0, 1), c("#d9f2d9", "#006400")),  # Light to Dark Green
    cmin = min(treemap_imports$tm$vSize),
    cmax = max(treemap_imports$tm$vSize),
    colorbar = list(title = "Trade Value")
  )
) %>%
  layout(title = "Imports Treemap - 2024")

p1_interactive
p2_interactive

4.3 Time series analysis

4.3.1 Trade in Services (Exports) - 2000 to 2012 to 2024

Key observations:

  • Rapid Growth: Singapore’s service exports have grown significantly, reinforcing its role as a global trade hub.

  • Leading Sectors: Transport and Other Business Services dominate, showing the steepest growth.

  • Financial & Digital Expansion: Financial Services, Telecom & IT, and Intellectual Property have surged, driven by digitalization and economic policies.

  • Resilience & Acceleration: Despite slower growth around 2012 (likely due to economic shocks), post-2018 expansion highlights policy-driven trade growth.

Note
  • Singapore’s services exports continue to thrive, reflecting its strong economic positioning

Code
#| fig-width: 8
#| fig-height: 6

# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)

# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))

# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)  
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)

# Filter for Exports only
exports_df <- final_trade_df %>% filter(`Trade Type` == "Exports")

# Select relevant columns: Data Series, 2000, 2012, 2024
exports_slope <- exports_df %>%
  select(`Data Series`, `2000`, `2012`, `2024`) %>%
  drop_na()

# Keep only the top 10 services in 2024 for clarity
exports_slope <- exports_slope %>% arrange(desc(`2024`)) %>% head(10)

# Convert data to long format for `newggslopegraph()`
exports_long <- exports_slope %>%
  pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")

# Convert Year to character (Fix for newggslopegraph)
exports_long$Year <- as.character(exports_long$Year)

# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)

# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = exports_long,
                Times = Year,   
                Measurement = Value,
                Grouping = `Data Series`,
                Title = "Trade in Services (Exports) - Slopegraph (2000-2012-2024)",
                SubTitle = "Top 10 Services by Export Value",
                LineThickness = 1.0,
                DataTextSize = 2.5)

4.3.2 Trade in Services (Imports) - 2000 to 2012 to 2024

Key observations:

  • Strong Growth in Imports: Similar to exports, Singapore’s service imports have grown significantly, reflecting its global trade connectivity.

  • Transport & Other Business Services Lead: These two sectors dominate imports, mirroring export trends, but at slightly lower values.

  • Rising Financial Services & Telecom & IT: These sectors have seen notable import growth, indicating increasing reliance on foreign expertise and digital services.

  • Higher Import Dependency in Intellectual Property: Compared to exports, imports in Intellectual Property have increased faster, suggesting rising licensing and royalty payments.

Note
  • Singapore’s service imports align with its export-driven economy, with strong growth in digital services, finance, and transport.

Code
#| fig-width: 8
#| fig-height: 6

# Load required libraries
library(dplyr)
library(tidyr)
library(CGPfunctions)

# Ensure column names are trimmed properly
colnames(final_trade_df) <- trimws(colnames(final_trade_df))

# Shorten long category labels for better readability (Fixed repetitive label issue)
final_trade_df$`Data Series` <- gsub("Manufacturing Services On Physical Inputs Owned By Others", "Manuf. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Maintenance And Repair Services", "Maintenance & Repair", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Government Goods And Services", "Govt. Services", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("^Financial$", "Financial Services", final_trade_df$`Data Series`)  
final_trade_df$`Data Series` <- gsub("Telecommunications, Computer & Information", "Telecom & IT", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Charges For The Use Of Intellectual Property", "Intellectual Property", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Personal, Cultural And Recreational", "Cultural & Recreational", final_trade_df$`Data Series`)
final_trade_df$`Data Series` <- gsub("Other Business Services", "Other Biz Services", final_trade_df$`Data Series`)

# Filter for Imports only
imports_df <- final_trade_df %>% filter(`Trade Type` == "Imports")

# Select relevant columns: Data Series, 2000, 2012, 2024
imports_slope <- imports_df %>%
  select(`Data Series`, `2000`, `2012`, `2024`) %>%
  drop_na()

# Keep only the top 10 services in 2024 for clarity
imports_slope <- imports_slope %>% arrange(desc(`2024`)) %>% head(10)

# Convert data to long format for `newggslopegraph()`
imports_long <- imports_slope %>%
  pivot_longer(cols = c(`2000`, `2012`, `2024`), names_to = "Year", values_to = "Value")

# Convert Year to character (Fix for newggslopegraph)
imports_long$Year <- as.character(imports_long$Year)

# Increase figure height to improve label spacing
options(repr.plot.width = 14, repr.plot.height = 12)

# Create slopegraph with **2000, 2012, and 2024**
newggslopegraph(dataframe = imports_long,
                Times = Year,   
                Measurement = Value,
                Grouping = `Data Series`,
                Title = "Trade in Services (Imports) - Slopegraph (2000-2012-2024)",
                SubTitle = "Top 10 Services by Import Value",
                LineThickness = 1.0,
                DataTextSize = 2.5)

5. DataVis makeover 3

5.1 Makeover of - Major Trading Partners for Trade in Services, 2023

5.1.1 Original visualization

The original visualizations analyzed in this analysis are sourced from here.

5.1.2 Evaluation of the original visualization

In the below section, we will evaluate the effectiveness of the above visualization by identifying its pros and cons, focusing on aspects such as (1) clarity, and (2) visual appeal:

  • Clarity: How well the data is presented and understood
Pros Cons Suggested fixes
Clear regional differentiation - The color-coded regions (North America, Europe, Asia, Oceania) make it easier to distinguish areas. Color scheme inconsistency - The colors used for regions in the world map do not match the second visualization, leading to confusion. Ensure consistent colors across both graphics (e.g., North America should be red in both images).
Easy country identification - The use of country flags provides quick recognition of major trading partners on the world map. Lack of legend for flags - It is unclear whether the flags represent top import/export partners or something else. Include a small legend explaining the significance of the flags.
Well-placed title - The main title is clear and ensures users immediately understand the topic. World map does not indicate rankings - It is difficult to determine the largest trading partners. Add ranking indicators - e.g.: numbered markers, or a gradient effect on the map.
Need to scroll to see rankings - Users must scroll down to understand that U.S is the top trading partner, followed by EU, and China. Overlay rankings on the world map so users can see top trading partners at a glance.
Lack of clear distinction between Regions, Countries, and Economic & Political Unions - The visualization presents them together, making it difficult to see how rankings differ. Introduce clear labels or grouping for Regions, Countries, and Economic & Political Unions in both the world map and bar chart.
  • Visual appeal: How visually engaging and effective the design is
Pros Cons Suggested fixes
Engaging world map design - The dotted world map with vibrant colors creates a modern and appealing look. Color inconsistency between the two visuals - North America is red in the map but blue in the bar chart. Use a unified color scheme so regions in both visuals match.

5.2 Makeover of the original data visualization

5.2.1 Data wrangling & data preparation

5.2.1.1 Importing the data

The code chunk below imports the (1) Exports of Services by Major Trading Partner datasets, and (2) Imports of Services by Major Trading Partner datasets downloaded from the Department of Statistics Singapore (DOS), using the read_excel() function from the readxl package. These datasets contain trade value of exports and imports of services by major trading partner across various 2000 to 2023, which will be processed and analyzed in subsequent steps.

Code
export_partners <- read_excel("data/Exports Of Services By Major Trading Partner_base.xlsx")
Code
import_partners <- read_excel("data/Imports Of Services By Major Trading Partner_base.xlsx")
5.2.1.2 Understanding the export_partners data structure
  • glimpse(): provides a transposed overview of a dataset, showing variables and their types in a concise format.
  • head(): displays the first few rows of a dataset (default is 6 rows) to give a quick preview of the data.
  • summary(): generates a statistical summary of each variable, including measures like mean, median, and range for numeric data.
  • duplicated():returns a logical vector indicating which elements or rows in a vector or data frame are duplicates.
  • colSums(is.na()): counts the number of missing values (NA) in each column of the data frame.
  • str(): use str() to display the column names, data types, and a preview of the data.
Code
glimpse(export_partners)
Rows: 68
Columns: 25
$ `Data Series` <chr> "Asia", "Bangladesh", "Brunei Darussalam", "Cambodia", "…
$ `2023`        <dbl> 170787.7, 985.7, 960.1, 475.6, 21686.1, 9909.4, 8542.6, …
$ `2022`        <dbl> 174243.3, 903.9, 891.2, 389.5, 21307.3, 9730.1, 8366.6, …
$ `2021`        <dbl> 147125.9, 674.8, 579.0, 256.0, 18126.2, 7817.0, 6222.5, …
$ `2020`        <dbl> 115989.3, 581.6, 533.4, 250.3, 15063.2, 6337.6, 5351.4, …
$ `2019`        <dbl> 114573.1, 642.4, 556.4, 259.1, 11273.2, 6383.4, 6781.8, …
$ `2018`        <dbl> 104182.5, 664.0, 495.9, 290.4, 10466.3, 5957.8, 5895.2, …
$ `2017`        <dbl> 85093.3, 547.7, 463.9, 226.8, 8239.5, 5326.3, 5615.7, 25…
$ `2016`        <dbl> 70461.8, 482.6, 420.5, 196.1, 7351.8, 4422.4, 5088.6, 23…
$ `2015`        <dbl> 69881.1, 424.3, 465.4, 114.3, 7260.0, 4236.4, 5032.0, 17…
$ `2014`        <dbl> 62802.7, 384.3, 520.4, 86.3, 5841.4, 4276.0, 4837.1, 159…
$ `2013`        <dbl> 55949.9, 347.5, 474.4, 89.2, 5184.6, 4857.4, 4535.8, 155…
$ `2012`        <dbl> 50496.3, 349.1, 464.1, 115.0, 5156.3, 4375.0, 4062.7, 14…
$ `2011`        <dbl> 48318.2, 333.5, 468.5, 109.2, 5104.8, 4244.7, 3537.8, 13…
$ `2010`        <dbl> 46254.5, 272.6, 508.1, 78.4, 4928.8, 4089.3, 3291.1, 114…
$ `2009`        <chr> "40325.300000000003", "209.3", "324.39999999999998", "99…
$ `2008`        <dbl> 43935.5, 264.8, 272.6, 139.7, 4676.6, 3502.5, 3321.9, 93…
$ `2007`        <dbl> 39789.5, 284.0, 248.3, 88.1, 4429.7, 3191.3, 3026.0, 136…
$ `2006`        <dbl> 33098.0, 285.5, 120.6, 110.8, 3555.3, 2448.5, 2969.8, 10…
$ `2005`        <chr> "28070.400000000001", "274.60000000000002", "123.3", "77…
$ `2004`        <chr> "25457.1", "188.9", "138.6", "68.7", "2313.6999999999998…
$ `2003`        <chr> "18942", "157", "151.4", "45.3", "2059.6999999999998", "…
$ `2002`        <chr> "17741.7", "118.3", "144.30000000000001", "37.9", "1912"…
$ `2001`        <chr> "16226.5", "121.8", "97.8", "29.8", "1866.5", "735.4", "…
$ `2000`        <chr> "15816.9", "89.7", "100.5", "18.399999999999999", "1860.…
Code
head(export_partners)
# A tibble: 6 × 25
  `Data Series`   `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016` `2015`
  <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Asia            1.71e5 1.74e5 1.47e5 1.16e5 1.15e5 1.04e5 85093. 70462. 69881.
2 Bangladesh      9.86e2 9.04e2 6.75e2 5.82e2 6.42e2 6.64e2   548.   483.   424.
3 Brunei Darussa… 9.60e2 8.91e2 5.79e2 5.33e2 5.56e2 4.96e2   464.   420.   465.
4 Cambodia        4.76e2 3.89e2 2.56e2 2.50e2 2.59e2 2.90e2   227.   196.   114.
5 Hong Kong       2.17e4 2.13e4 1.81e4 1.51e4 1.13e4 1.05e4  8240.  7352.  7260 
6 India           9.91e3 9.73e3 7.82e3 6.34e3 6.38e3 5.96e3  5326.  4422.  4236.
# ℹ 15 more variables: `2014` <dbl>, `2013` <dbl>, `2012` <dbl>, `2011` <dbl>,
#   `2010` <dbl>, `2009` <chr>, `2008` <dbl>, `2007` <dbl>, `2006` <dbl>,
#   `2005` <chr>, `2004` <chr>, `2003` <chr>, `2002` <chr>, `2001` <chr>,
#   `2000` <chr>
Code
summary(export_partners)
 Data Series             2023               2022               2021         
 Length:68          Min.   :   296.0   Min.   :   169.8   Min.   :   191.5  
 Class :character   1st Qu.:   789.3   1st Qu.:   685.9   1st Qu.:   560.1  
 Mode  :character   Median :  3133.4   Median :  3536.1   Median :  2923.7  
                    Mean   : 13585.2   Mean   : 13476.6   Mean   : 11404.5  
                    3rd Qu.:  9860.0   3rd Qu.:  9770.0   3rd Qu.:  8172.4  
                    Max.   :170787.7   Max.   :174243.3   Max.   :147125.9  
      2020               2019               2018               2017        
 Min.   :   104.1   Min.   :    93.3   Min.   :    85.0   Min.   :   67.9  
 1st Qu.:   430.4   1st Qu.:   461.6   1st Qu.:   434.7   1st Qu.:  351.5  
 Median :  2219.9   Median :  1751.8   Median :  1659.9   Median : 1156.8  
 Mean   :  8714.4   Mean   :  8282.8   Mean   :  7774.1   Mean   : 6370.7  
 3rd Qu.:  6733.2   3rd Qu.:  6483.0   3rd Qu.:  5910.9   3rd Qu.: 5398.6  
 Max.   :115989.3   Max.   :114573.1   Max.   :104182.5   Max.   :85093.3  
      2016              2015              2014              2013        
 Min.   :   72.0   Min.   :   72.1   Min.   :   61.5   Min.   :   66.5  
 1st Qu.:  295.7   1st Qu.:  370.0   1st Qu.:  311.5   1st Qu.:  307.6  
 Median : 1173.2   Median : 1549.8   Median : 1377.8   Median : 1177.2  
 Mean   : 5541.1   Mean   : 5569.5   Mean   : 5011.7   Mean   : 4548.1  
 3rd Qu.: 4588.9   3rd Qu.: 5121.5   3rd Qu.: 4416.3   3rd Qu.: 4545.9  
 Max.   :70461.8   Max.   :69881.1   Max.   :62802.7   Max.   :55949.9  
      2012              2011              2010             2009          
 Min.   :   77.3   Min.   :   33.1   Min.   :   17.5   Length:68         
 1st Qu.:  279.5   1st Qu.:  233.1   1st Qu.:  262.3   Class :character  
 Median : 1008.1   Median : 1023.5   Median :  769.5   Mode  :character  
 Mean   : 4081.9   Mean   : 3751.8   Mean   : 3455.8                     
 3rd Qu.: 3777.1   3rd Qu.: 3442.4   3rd Qu.: 3007.9                     
 Max.   :50496.3   Max.   :48318.2   Max.   :46254.5                     
      2008              2007              2006             2005          
 Min.   :    5.4   Min.   :    1.8   Min.   :    1.6   Length:68         
 1st Qu.:  204.1   1st Qu.:  179.8   1st Qu.:  146.8   Class :character  
 Median :  683.1   Median :  534.7   Median :  441.1   Mode  :character  
 Mean   : 3270.8   Mean   : 2834.3   Mean   : 2384.7                     
 3rd Qu.: 2840.8   3rd Qu.: 2727.7   3rd Qu.: 2028.2                     
 Max.   :43935.5   Max.   :39789.5   Max.   :33098.0                     
     2004               2003               2002               2001          
 Length:68          Length:68          Length:68          Length:68         
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
     2000          
 Length:68         
 Class :character  
 Mode  :character  
                   
                   
                   
Code
export_partners[duplicated(export_partners),]
# A tibble: 0 × 25
# ℹ 25 variables: Data Series <chr>, 2023 <dbl>, 2022 <dbl>, 2021 <dbl>,
#   2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <dbl>, 2015 <dbl>,
#   2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>, 2009 <chr>,
#   2008 <dbl>, 2007 <dbl>, 2006 <dbl>, 2005 <chr>, 2004 <chr>, 2003 <chr>,
#   2002 <chr>, 2001 <chr>, 2000 <chr>
Note
  • Ensure that there are no duplicated columns, if not will have to investigate further.
Code
colSums(is.na(export_partners))
Data Series        2023        2022        2021        2020        2019 
          0           0           0           0           0           0 
       2018        2017        2016        2015        2014        2013 
          0           0           0           0           0           0 
       2012        2011        2010        2009        2008        2007 
          0           0           0           0           0           0 
       2006        2005        2004        2003        2002        2001 
          0           0           0           0           0           0 
       2000 
          0 
Note
  • Ensure that there are no NA values, if not will have to investigate further.
  • Possibility to use the replace_na() function to replace missing values with 0 in specified columns.
Code
str(export_partners)
tibble [68 × 25] (S3: tbl_df/tbl/data.frame)
 $ Data Series: chr [1:68] "Asia" "Bangladesh" "Brunei Darussalam" "Cambodia" ...
 $ 2023       : num [1:68] 170788 986 960 476 21686 ...
 $ 2022       : num [1:68] 174243 904 891 390 21307 ...
 $ 2021       : num [1:68] 147126 675 579 256 18126 ...
 $ 2020       : num [1:68] 115989 582 533 250 15063 ...
 $ 2019       : num [1:68] 114573 642 556 259 11273 ...
 $ 2018       : num [1:68] 104183 664 496 290 10466 ...
 $ 2017       : num [1:68] 85093 548 464 227 8240 ...
 $ 2016       : num [1:68] 70462 483 420 196 7352 ...
 $ 2015       : num [1:68] 69881 424 465 114 7260 ...
 $ 2014       : num [1:68] 62802.7 384.3 520.4 86.3 5841.4 ...
 $ 2013       : num [1:68] 55949.9 347.5 474.4 89.2 5184.6 ...
 $ 2012       : num [1:68] 50496 349 464 115 5156 ...
 $ 2011       : num [1:68] 48318 334 468 109 5105 ...
 $ 2010       : num [1:68] 46254.5 272.6 508.1 78.4 4928.8 ...
 $ 2009       : chr [1:68] "40325.300000000003" "209.3" "324.39999999999998" "99.3" ...
 $ 2008       : num [1:68] 43936 265 273 140 4677 ...
 $ 2007       : num [1:68] 39789.5 284 248.3 88.1 4429.7 ...
 $ 2006       : num [1:68] 33098 286 121 111 3555 ...
 $ 2005       : chr [1:68] "28070.400000000001" "274.60000000000002" "123.3" "77.099999999999994" ...
 $ 2004       : chr [1:68] "25457.1" "188.9" "138.6" "68.7" ...
 $ 2003       : chr [1:68] "18942" "157" "151.4" "45.3" ...
 $ 2002       : chr [1:68] "17741.7" "118.3" "144.30000000000001" "37.9" ...
 $ 2001       : chr [1:68] "16226.5" "121.8" "97.8" "29.8" ...
 $ 2000       : chr [1:68] "15816.9" "89.7" "100.5" "18.399999999999999" ...
Note
  • Ensure that all variables are correctly classified by data type; recast variable types if needed.
  • Variables are correctly classified - where categorical variables are classified as character, while continuous variables are classified as double.
5.2.1.3 Understanding the import_partners data structure
Code
glimpse(import_partners)
Rows: 68
Columns: 25
$ `Data Series` <chr> "Asia", "Bangladesh", "Brunei Darussalam", "Cambodia", "…
$ `2023`        <dbl> 139497.1, 468.8, 196.8, 121.2, 20255.7, 13532.6, 3517.7,…
$ `2022`        <dbl> 136729.5, 599.1, 83.5, 133.6, 19762.7, 12913.9, 3632.4, …
$ `2021`        <dbl> 115022.1, 441.8, 53.6, 150.8, 20830.3, 10462.6, 3358.9, …
$ `2020`        <dbl> 93351.0, 392.9, 69.7, 159.6, 14410.3, 8780.3, 2949.9, 22…
$ `2019`        <dbl> 83242.3, 380.9, 49.2, 217.9, 13093.5, 7813.9, 2794.3, 23…
$ `2018`        <dbl> 78004.9, 318.7, 82.2, 116.5, 12234.9, 6612.3, 2494.2, 28…
$ `2017`        <dbl> 65654.5, 251.9, 73.5, 112.7, 10874.2, 5461.9, 2245.5, 32…
$ `2016`        <chr> "55208.6", "233.9", "96.6", "92.6", "8845.4", "4549.6000…
$ `2015`        <dbl> 52852.1, 192.7, 67.1, 82.1, 8664.1, 4035.6, 2170.2, 215.…
$ `2014`        <dbl> 48565.4, 234.8, 53.1, 65.8, 7674.3, 3839.5, 1990.2, 148.…
$ `2013`        <dbl> 40683.5, 176.8, 48.9, 63.5, 5096.7, 3555.7, 1920.0, 138.…
$ `2012`        <dbl> 36390.3, 160.0, 58.3, 127.8, 4233.6, 3039.8, 1816.8, 113…
$ `2011`        <dbl> 34902.8, 149.4, 41.3, 133.2, 3944.7, 2979.5, 1733.9, 74.…
$ `2010`        <dbl> 31712.3, 141.0, 36.6, 110.7, 4279.6, 2700.1, 1780.2, 99.…
$ `2009`        <dbl> 27360.1, 106.5, 40.2, 102.9, 3503.1, 2352.3, 1566.5, 62.…
$ `2008`        <dbl> 28848.5, 82.4, 109.6, 112.4, 3623.9, 2345.5, 1673.7, 60.…
$ `2007`        <dbl> 24679.0, 79.9, 101.2, 100.2, 3520.0, 1869.5, 1587.4, 47.…
$ `2006`        <chr> "22188.6", "105.7", "49.6", "205.6", "2938.8", "1512.1",…
$ `2005`        <chr> "19161.400000000001", "91.2", "36.299999999999997", "69.…
$ `2004`        <chr> "18110.8", "82.2", "60.6", "58", "2222.6999999999998", "…
$ `2003`        <chr> "12607.1", "70.900000000000006", "49.3", "34.5", "1685.2…
$ `2002`        <chr> "10987.5", "77.5", "37.799999999999997", "30", "1585.9",…
$ `2001`        <chr> "9702.7000000000007", "39.700000000000003", "29.4", "20.…
$ `2000`        <chr> "8639.7999999999993", "38.799999999999997", "32.6", "9",…
Code
head(import_partners)
# A tibble: 6 × 25
  `Data Series`   `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016` `2015`
  <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>   <dbl>
1 Asia            1.39e5 1.37e5 1.15e5 9.34e4 8.32e4 7.80e4 6.57e4 55208… 5.29e4
2 Bangladesh      4.69e2 5.99e2 4.42e2 3.93e2 3.81e2 3.19e2 2.52e2 233.9  1.93e2
3 Brunei Darussa… 1.97e2 8.35e1 5.36e1 6.97e1 4.92e1 8.22e1 7.35e1 96.6   6.71e1
4 Cambodia        1.21e2 1.34e2 1.51e2 1.60e2 2.18e2 1.16e2 1.13e2 92.6   8.21e1
5 Hong Kong       2.03e4 1.98e4 2.08e4 1.44e4 1.31e4 1.22e4 1.09e4 8845.4 8.66e3
6 India           1.35e4 1.29e4 1.05e4 8.78e3 7.81e3 6.61e3 5.46e3 4549.… 4.04e3
# ℹ 15 more variables: `2014` <dbl>, `2013` <dbl>, `2012` <dbl>, `2011` <dbl>,
#   `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>, `2006` <chr>,
#   `2005` <chr>, `2004` <chr>, `2003` <chr>, `2002` <chr>, `2001` <chr>,
#   `2000` <chr>
Code
summary(import_partners)
 Data Series             2023               2022               2021         
 Length:68          Min.   :    93.1   Min.   :    83.5   Min.   :    53.6  
 Class :character   1st Qu.:   608.7   1st Qu.:   565.6   1st Qu.:   448.8  
 Mode  :character   Median :  2333.1   Median :  2153.3   Median :  1762.8  
                    Mean   : 11464.3   Mean   : 10862.6   Mean   :  9301.5  
                    3rd Qu.:  8176.1   3rd Qu.:  8600.9   3rd Qu.:  6637.0  
                    Max.   :139497.1   Max.   :136729.5   Max.   :115022.1  
      2020              2019              2018              2017        
 Min.   :   69.7   Min.   :   49.2   Min.   :   82.2   Min.   :   73.5  
 1st Qu.:  344.2   1st Qu.:  309.6   1st Qu.:  310.2   1st Qu.:  274.6  
 Median : 1669.7   Median : 1556.3   Median : 1565.8   Median : 1411.4  
 Mean   : 7982.5   Mean   : 7263.2   Mean   : 7045.8   Mean   : 6413.4  
 3rd Qu.: 5550.0   3rd Qu.: 5604.6   3rd Qu.: 5620.9   3rd Qu.: 6136.3  
 Max.   :93351.0   Max.   :83242.3   Max.   :78004.9   Max.   :65654.5  
     2016                2015              2014              2013        
 Length:68          Min.   :   34.3   Min.   :   40.3   Min.   :   48.9  
 Class :character   1st Qu.:  217.2   1st Qu.:  250.5   1st Qu.:  192.3  
 Mode  :character   Median : 1142.8   Median : 1000.9   Median :  901.6  
                    Mean   : 5567.2   Mean   : 5337.7   Mean   : 4623.7  
                    3rd Qu.: 4324.4   3rd Qu.: 5590.8   3rd Qu.: 5151.0  
                    Max.   :52852.1   Max.   :50558.2   Max.   :44479.7  
      2012              2011              2010              2009         
 Min.   :   43.5   Min.   :   27.2   Min.   :    4.7   Min.   :    4.50  
 1st Qu.:  160.7   1st Qu.:  141.7   1st Qu.:  109.5   1st Qu.:   98.22  
 Median :  628.3   Median :  606.6   Median :  438.5   Median :  380.20  
 Mean   : 4006.5   Mean   : 3466.0   Mean   : 3164.2   Mean   : 2726.97  
 3rd Qu.: 3773.5   3rd Qu.: 3178.5   3rd Qu.: 2564.5   3rd Qu.: 2279.35  
 Max.   :36390.3   Max.   :34902.8   Max.   :31712.3   Max.   :27360.10  
      2008               2007              2006               2005          
 Min.   :    4.50   Min.   :    8.90   Length:68          Length:68         
 1st Qu.:   99.97   1st Qu.:   85.83   Class :character   Class :character  
 Median :  407.45   Median :  280.55   Mode  :character   Mode  :character  
 Mean   : 2769.65   Mean   : 2348.02                                        
 3rd Qu.: 2253.03   3rd Qu.: 1720.08                                        
 Max.   :28848.50   Max.   :24679.00                                        
     2004               2003               2002               2001          
 Length:68          Length:68          Length:68          Length:68         
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
     2000          
 Length:68         
 Class :character  
 Mode  :character  
                   
                   
                   
Code
import_partners[duplicated(import_partners),]
# A tibble: 0 × 25
# ℹ 25 variables: Data Series <chr>, 2023 <dbl>, 2022 <dbl>, 2021 <dbl>,
#   2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <chr>, 2015 <dbl>,
#   2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>, 2009 <dbl>,
#   2008 <dbl>, 2007 <dbl>, 2006 <chr>, 2005 <chr>, 2004 <chr>, 2003 <chr>,
#   2002 <chr>, 2001 <chr>, 2000 <chr>
Note
  • Ensure that there are no duplicated columns, if not will have to investigate further.
Code
colSums(is.na(import_partners))
Data Series        2023        2022        2021        2020        2019 
          0           0           0           0           0           0 
       2018        2017        2016        2015        2014        2013 
          0           0           0           0           0           0 
       2012        2011        2010        2009        2008        2007 
          0           0           0           0           0           0 
       2006        2005        2004        2003        2002        2001 
          0           0           0           0           0           0 
       2000 
          0 
Note
  • Ensure that there are no NA values, if not will have to investigate further.
  • Possibility to use the replace_na() function to replace missing values with 0 in specified columns.
Code
str(import_partners)
tibble [68 × 25] (S3: tbl_df/tbl/data.frame)
 $ Data Series: chr [1:68] "Asia" "Bangladesh" "Brunei Darussalam" "Cambodia" ...
 $ 2023       : num [1:68] 139497 469 197 121 20256 ...
 $ 2022       : num [1:68] 136729.5 599.1 83.5 133.6 19762.7 ...
 $ 2021       : num [1:68] 115022.1 441.8 53.6 150.8 20830.3 ...
 $ 2020       : num [1:68] 93351 392.9 69.7 159.6 14410.3 ...
 $ 2019       : num [1:68] 83242.3 380.9 49.2 217.9 13093.5 ...
 $ 2018       : num [1:68] 78004.9 318.7 82.2 116.5 12234.9 ...
 $ 2017       : num [1:68] 65654.5 251.9 73.5 112.7 10874.2 ...
 $ 2016       : chr [1:68] "55208.6" "233.9" "96.6" "92.6" ...
 $ 2015       : num [1:68] 52852.1 192.7 67.1 82.1 8664.1 ...
 $ 2014       : num [1:68] 48565.4 234.8 53.1 65.8 7674.3 ...
 $ 2013       : num [1:68] 40683.5 176.8 48.9 63.5 5096.7 ...
 $ 2012       : num [1:68] 36390.3 160 58.3 127.8 4233.6 ...
 $ 2011       : num [1:68] 34902.8 149.4 41.3 133.2 3944.7 ...
 $ 2010       : num [1:68] 31712.3 141 36.6 110.7 4279.6 ...
 $ 2009       : num [1:68] 27360.1 106.5 40.2 102.9 3503.1 ...
 $ 2008       : num [1:68] 28848.5 82.4 109.6 112.4 3623.9 ...
 $ 2007       : num [1:68] 24679 79.9 101.2 100.2 3520 ...
 $ 2006       : chr [1:68] "22188.6" "105.7" "49.6" "205.6" ...
 $ 2005       : chr [1:68] "19161.400000000001" "91.2" "36.299999999999997" "69.900000000000006" ...
 $ 2004       : chr [1:68] "18110.8" "82.2" "60.6" "58" ...
 $ 2003       : chr [1:68] "12607.1" "70.900000000000006" "49.3" "34.5" ...
 $ 2002       : chr [1:68] "10987.5" "77.5" "37.799999999999997" "30" ...
 $ 2001       : chr [1:68] "9702.7000000000007" "39.700000000000003" "29.4" "20.9" ...
 $ 2000       : chr [1:68] "8639.7999999999993" "38.799999999999997" "32.6" "9" ...
Note
  • Ensure that all variables are correctly classified by data type; recast variable types if needed.
  • Variables are correctly classified - where categorical variables are classified as character, while continuous variables are classified as double.
5.2.1.4 Combining the export_partners and import_partners

To ensure consistency, column names were converted to lowercase, and spaces were replaced with underscores. Since numerical values were stored as different data types in the datasets, all year columns (2000-2023) were converted to numeric format to prevent data type mismatches. Additionally, any missing (NA) values or dashes (“-”) were replaced with 0 to ensure data integrity. A “category” column was added to distinguish between Exports and Imports. Finally, the two datasets were combined using bind_rows().

Important
  • Year columns (2000-2023) in both Exports and Imports dataset were converted to numeric format to ensure consistency and prevent data type mismatches.
  • There are missing (NA) values and “-” characters in the dataset.
  • Replace NA values and “-” with 0 using the mutate() and replace() functions.
Code
# Ensure column names are consistent
colnames(export_partners) <- tolower(gsub(" ", "_", colnames(export_partners)))
colnames(import_partners) <- tolower(gsub(" ", "_", colnames(import_partners)))

# Convert all numeric year columns to numeric type safely by handling non-numeric values and replacing NA and "-" with 0
export_partners <- export_partners %>% mutate(across(matches("^\\d{4}$"), ~suppressWarnings(as.numeric(.)))) %>%
    mutate(across(matches("^\\d{4}$"), ~replace(., is.na(.) | . == "-", 0)))
import_partners <- import_partners %>% mutate(across(matches("^\\d{4}$"), ~suppressWarnings(as.numeric(.)))) %>%
    mutate(across(matches("^\\d{4}$"), ~replace(., is.na(.) | . == "-", 0)))

# Add a column to indicate the category (Exports, Imports)
export_partners <- export_partners %>% mutate(category = "Exports")
import_partners <- import_partners %>% mutate(category = "Imports")

# Combine both datasets
combined_data <- bind_rows(export_partners, import_partners)
5.2.1.5 Understanding the combined_data data structure

The combined_data tibble contains 26 attributes, as shown below.

The following preprocessing checks were conducted as part of data preparation:

Preprocessing Checks
  • Verified that the correct data types were loaded in the combined_data dataset using glimpse() and str()
  • Ensured there were no duplicate variable names using duplicated() in the dataset
  • Checked for missing values using colSums(is.na())
Code
glimpse(combined_data)
Rows: 136
Columns: 26
$ data_series <chr> "Asia", "Bangladesh", "Brunei Darussalam", "Cambodia", "Ho…
$ `2023`      <dbl> 170787.7, 985.7, 960.1, 475.6, 21686.1, 9909.4, 8542.6, 81…
$ `2022`      <dbl> 174243.3, 903.9, 891.2, 389.5, 21307.3, 9730.1, 8366.6, 58…
$ `2021`      <dbl> 147125.9, 674.8, 579.0, 256.0, 18126.2, 7817.0, 6222.5, 50…
$ `2020`      <dbl> 115989.3, 581.6, 533.4, 250.3, 15063.2, 6337.6, 5351.4, 35…
$ `2019`      <dbl> 114573.1, 642.4, 556.4, 259.1, 11273.2, 6383.4, 6781.8, 25…
$ `2018`      <dbl> 104182.5, 664.0, 495.9, 290.4, 10466.3, 5957.8, 5895.2, 24…
$ `2017`      <dbl> 85093.3, 547.7, 463.9, 226.8, 8239.5, 5326.3, 5615.7, 254.…
$ `2016`      <dbl> 70461.8, 482.6, 420.5, 196.1, 7351.8, 4422.4, 5088.6, 231.…
$ `2015`      <dbl> 69881.1, 424.3, 465.4, 114.3, 7260.0, 4236.4, 5032.0, 175.…
$ `2014`      <dbl> 62802.7, 384.3, 520.4, 86.3, 5841.4, 4276.0, 4837.1, 159.2…
$ `2013`      <dbl> 55949.9, 347.5, 474.4, 89.2, 5184.6, 4857.4, 4535.8, 155.1…
$ `2012`      <dbl> 50496.3, 349.1, 464.1, 115.0, 5156.3, 4375.0, 4062.7, 146.…
$ `2011`      <dbl> 48318.2, 333.5, 468.5, 109.2, 5104.8, 4244.7, 3537.8, 132.…
$ `2010`      <dbl> 46254.5, 272.6, 508.1, 78.4, 4928.8, 4089.3, 3291.1, 114.2…
$ `2009`      <dbl> 40325.3, 209.3, 324.4, 99.3, 4344.5, 3712.2, 3642.6, 89.2,…
$ `2008`      <dbl> 43935.5, 264.8, 272.6, 139.7, 4676.6, 3502.5, 3321.9, 93.3…
$ `2007`      <dbl> 39789.5, 284.0, 248.3, 88.1, 4429.7, 3191.3, 3026.0, 136.5…
$ `2006`      <dbl> 33098.0, 285.5, 120.6, 110.8, 3555.3, 2448.5, 2969.8, 102.…
$ `2005`      <dbl> 28070.4, 274.6, 123.3, 77.1, 2790.5, 2084.6, 2364.4, 90.4,…
$ `2004`      <dbl> 25457.1, 188.9, 138.6, 68.7, 2313.7, 1742.1, 2231.8, 87.6,…
$ `2003`      <dbl> 18942.0, 157.0, 151.4, 45.3, 2059.7, 1241.4, 1767.5, 63.0,…
$ `2002`      <dbl> 17741.7, 118.3, 144.3, 37.9, 1912.0, 907.6, 1801.4, 51.5, …
$ `2001`      <dbl> 16226.5, 121.8, 97.8, 29.8, 1866.5, 735.4, 1707.5, 24.6, 3…
$ `2000`      <dbl> 15816.9, 89.7, 100.5, 18.4, 1860.6, 746.8, 1607.9, 28.9, 3…
$ category    <chr> "Exports", "Exports", "Exports", "Exports", "Exports", "Ex…
Code
head(combined_data)
# A tibble: 6 × 26
  data_series     `2023` `2022` `2021` `2020` `2019` `2018` `2017` `2016` `2015`
  <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Asia            1.71e5 1.74e5 1.47e5 1.16e5 1.15e5 1.04e5 85093. 70462. 69881.
2 Bangladesh      9.86e2 9.04e2 6.75e2 5.82e2 6.42e2 6.64e2   548.   483.   424.
3 Brunei Darussa… 9.60e2 8.91e2 5.79e2 5.33e2 5.56e2 4.96e2   464.   420.   465.
4 Cambodia        4.76e2 3.89e2 2.56e2 2.50e2 2.59e2 2.90e2   227.   196.   114.
5 Hong Kong       2.17e4 2.13e4 1.81e4 1.51e4 1.13e4 1.05e4  8240.  7352.  7260 
6 India           9.91e3 9.73e3 7.82e3 6.34e3 6.38e3 5.96e3  5326.  4422.  4236.
# ℹ 16 more variables: `2014` <dbl>, `2013` <dbl>, `2012` <dbl>, `2011` <dbl>,
#   `2010` <dbl>, `2009` <dbl>, `2008` <dbl>, `2007` <dbl>, `2006` <dbl>,
#   `2005` <dbl>, `2004` <dbl>, `2003` <dbl>, `2002` <dbl>, `2001` <dbl>,
#   `2000` <dbl>, category <chr>
Code
summary(combined_data)
 data_series             2023               2022               2021         
 Length:136         Min.   :    93.1   Min.   :    83.5   Min.   :    53.6  
 Class :character   1st Qu.:   701.3   1st Qu.:   596.2   1st Qu.:   492.4  
 Mode  :character   Median :  2595.8   Median :  2330.3   Median :  1957.3  
                    Mean   : 12524.7   Mean   : 12169.6   Mean   : 10353.0  
                    3rd Qu.:  8674.3   3rd Qu.:  8664.1   3rd Qu.:  7272.9  
                    Max.   :170787.7   Max.   :174243.3   Max.   :147125.9  
      2020               2019               2018               2017        
 Min.   :    69.7   Min.   :    49.2   Min.   :    82.2   Min.   :   67.9  
 1st Qu.:   388.6   1st Qu.:   408.8   1st Qu.:   378.1   1st Qu.:  311.0  
 Median :  1725.6   Median :  1685.2   Median :  1597.5   Median : 1364.0  
 Mean   :  8348.4   Mean   :  7773.0   Mean   :  7410.0   Mean   : 6392.0  
 3rd Qu.:  6002.1   3rd Qu.:  6162.6   3rd Qu.:  5910.9   3rd Qu.: 5726.2  
 Max.   :115989.3   Max.   :114573.1   Max.   :104182.5   Max.   :85093.3  
      2016            2015              2014              2013        
 Min.   :    0   Min.   :   34.3   Min.   :   40.3   Min.   :   48.9  
 1st Qu.:  269   1st Qu.:  283.6   1st Qu.:  276.0   1st Qu.:  239.6  
 Median : 1188   Median : 1250.4   Median : 1148.3   Median : 1018.6  
 Mean   : 5509   Mean   : 5568.4   Mean   : 5174.7   Mean   : 4585.9  
 3rd Qu.: 4716   3rd Qu.: 5071.8   3rd Qu.: 4882.9   3rd Qu.: 4602.4  
 Max.   :70462   Max.   :69881.1   Max.   :62802.7   Max.   :55949.9  
      2012              2011              2010              2009        
 Min.   :   43.5   Min.   :   27.2   Min.   :    4.7   Min.   :    0.0  
 1st Qu.:  202.4   1st Qu.:  193.8   1st Qu.:  146.8   1st Qu.:  127.5  
 Median :  809.0   Median :  746.0   Median :  599.6   Median :  514.7  
 Mean   : 4044.2   Mean   : 3608.9   Mean   : 3310.0   Mean   : 2888.4  
 3rd Qu.: 3777.1   3rd Qu.: 3442.4   3rd Qu.: 2625.2   3rd Qu.: 2342.0  
 Max.   :50496.3   Max.   :48318.2   Max.   :46254.5   Max.   :40325.3  
      2008              2007              2006              2005         
 Min.   :    4.5   Min.   :    1.8   Min.   :    0.0   Min.   :    0.00  
 1st Qu.:  152.4   1st Qu.:  112.6   1st Qu.:  114.9   1st Qu.:   72.92  
 Median :  549.5   Median :  441.4   Median :  354.6   Median :  321.20  
 Mean   : 3020.2   Mean   : 2591.2   Mean   : 2250.5   Mean   : 1878.82  
 3rd Qu.: 2357.7   3rd Qu.: 1896.8   3rd Qu.: 1785.8   3rd Qu.: 1518.40  
 Max.   :43935.5   Max.   :39789.5   Max.   :33098.0   Max.   :28070.40  
      2004               2003               2002               2001         
 Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
 1st Qu.:   55.27   1st Qu.:   34.12   1st Qu.:   31.05   1st Qu.:   22.55  
 Median :  257.65   Median :  195.60   Median :  127.45   Median :  107.30  
 Mean   : 1699.77   Mean   : 1350.57   Mean   : 1132.43   Mean   : 1019.79  
 3rd Qu.: 1530.25   3rd Qu.: 1140.42   3rd Qu.:  890.12   3rd Qu.:  767.73  
 Max.   :25457.10   Max.   :18942.00   Max.   :17741.70   Max.   :16226.50  
      2000            category        
 Min.   :    0.00   Length:136        
 1st Qu.:   19.18   Class :character  
 Median :   98.50   Mode  :character  
 Mean   :  962.32                     
 3rd Qu.:  679.02                     
 Max.   :15816.90                     
Code
combined_data[duplicated(combined_data),]
# A tibble: 0 × 26
# ℹ 26 variables: data_series <chr>, 2023 <dbl>, 2022 <dbl>, 2021 <dbl>,
#   2020 <dbl>, 2019 <dbl>, 2018 <dbl>, 2017 <dbl>, 2016 <dbl>, 2015 <dbl>,
#   2014 <dbl>, 2013 <dbl>, 2012 <dbl>, 2011 <dbl>, 2010 <dbl>, 2009 <dbl>,
#   2008 <dbl>, 2007 <dbl>, 2006 <dbl>, 2005 <dbl>, 2004 <dbl>, 2003 <dbl>,
#   2002 <dbl>, 2001 <dbl>, 2000 <dbl>, category <chr>
Note
  • Ensure that there are no duplicated columns, if not will have to investigate further.
Code
colSums(is.na(combined_data))
data_series        2023        2022        2021        2020        2019 
          0           0           0           0           0           0 
       2018        2017        2016        2015        2014        2013 
          0           0           0           0           0           0 
       2012        2011        2010        2009        2008        2007 
          0           0           0           0           0           0 
       2006        2005        2004        2003        2002        2001 
          0           0           0           0           0           0 
       2000    category 
          0           0 
Note
  • Ensure that there are no NA values, if not will have to investigate further.
  • Possibility to use the replace_na() function to replace missing values with 0 in specified columns.
Code
str(combined_data)
tibble [136 × 26] (S3: tbl_df/tbl/data.frame)
 $ data_series: chr [1:136] "Asia" "Bangladesh" "Brunei Darussalam" "Cambodia" ...
 $ 2023       : num [1:136] 170788 986 960 476 21686 ...
 $ 2022       : num [1:136] 174243 904 891 390 21307 ...
 $ 2021       : num [1:136] 147126 675 579 256 18126 ...
 $ 2020       : num [1:136] 115989 582 533 250 15063 ...
 $ 2019       : num [1:136] 114573 642 556 259 11273 ...
 $ 2018       : num [1:136] 104183 664 496 290 10466 ...
 $ 2017       : num [1:136] 85093 548 464 227 8240 ...
 $ 2016       : num [1:136] 70462 483 420 196 7352 ...
 $ 2015       : num [1:136] 69881 424 465 114 7260 ...
 $ 2014       : num [1:136] 62802.7 384.3 520.4 86.3 5841.4 ...
 $ 2013       : num [1:136] 55949.9 347.5 474.4 89.2 5184.6 ...
 $ 2012       : num [1:136] 50496 349 464 115 5156 ...
 $ 2011       : num [1:136] 48318 334 468 109 5105 ...
 $ 2010       : num [1:136] 46254.5 272.6 508.1 78.4 4928.8 ...
 $ 2009       : num [1:136] 40325.3 209.3 324.4 99.3 4344.5 ...
 $ 2008       : num [1:136] 43936 265 273 140 4677 ...
 $ 2007       : num [1:136] 39789.5 284 248.3 88.1 4429.7 ...
 $ 2006       : num [1:136] 33098 286 121 111 3555 ...
 $ 2005       : num [1:136] 28070.4 274.6 123.3 77.1 2790.5 ...
 $ 2004       : num [1:136] 25457.1 188.9 138.6 68.7 2313.7 ...
 $ 2003       : num [1:136] 18942 157 151.4 45.3 2059.7 ...
 $ 2002       : num [1:136] 17741.7 118.3 144.3 37.9 1912 ...
 $ 2001       : num [1:136] 16226.5 121.8 97.8 29.8 1866.5 ...
 $ 2000       : num [1:136] 15816.9 89.7 100.5 18.4 1860.6 ...
 $ category   : chr [1:136] "Exports" "Exports" "Exports" "Exports" ...
Note
  • Ensure that all variables are correctly classified by data type; recast variable types if needed.
  • Variables are correctly classified - where categorical variables are classified as character, while continuous variables are classified as double.
5.2.1.6 Categorizing trading partners

A new “category” column will be added to classify each trading partner as a Country, Region, or Economic & Political Union. This helps organize the data, making it easier to analyze trade patterns across different entity types.

types data_series
Country Bangladesh, Brunei Darussalam, Cambodia, Hong Kong, India, Indonesia, Israel, Japan, Kuwait, Mainland China, Malaysia, Myanmar, Pakistan, Philippines, Qatar, Republic Of Korea, Saudi Arabia, Sri Lanka, Taiwan Thailand,Turkiye United Arab Emirates, Vietnam, Belgium, Cyprus Denmark, Finland France Germany, Greece Ireland Italy, Luxembourg, Netherlands, Norway, Portugal, Russian Federation, Spain, Sweden, Switzerland, United Kingdom, United States Of America, Canada, Australia, Marshall Islands, New Zealand, Papua New Guinea, Bermuda, Brazil, British Virgin Islands, Cayman Islands, Chile, Mexico Panama, Peru, Egypt, Liberia, Mauritius, Nigeria South Africa
Region Asia, Europe, North America, Oceania, South And Central America And The Caribbean, Africa
Economic & Political Union ASEAN, European Union (EU-27)
Code
# Load the categorization mapping from Excel
categorization_mapping <- read_excel("data/categorizationmapping.xlsx")

# Ensure column names are consistent
colnames(categorization_mapping) <- tolower(gsub(" ", "_", colnames(categorization_mapping)))

# Merge only the type into combined_data
combined_data <- combined_data %>% left_join(categorization_mapping, by = "data_series")
5.2.1.7 Aggregating the trade values across the years

To calculate the total trade values for each trading partner, the dataset was grouped by “data_series”, and trade values from 2000 to 2023 were summed using summarise(), ensuring missing values were ignored with na.rm = TRUE. A “category” column was assigned the value “Total” to indicate aggregated trade figures. The computed totals were then appended to the original dataset using bind_rows().

Code
# Summing up the values for each trading partner across all years (2000-2023)
summed_data <- combined_data %>%
    group_by(data_series, types) %>%
    summarise(across(matches("^\\d{4}$"), sum, na.rm = TRUE), .groups = "drop") %>%
    mutate(category = "Total")

# Add the total category to the combined dataset
final_data <- bind_rows(combined_data, summed_data)

# Reorder columns to move category to the second position
final_data <- final_data %>% select(data_series, category, types, everything())

# Display a preview of the merged data
print(head(final_data))
# A tibble: 6 × 27
  data_series    category types `2023` `2022` `2021` `2020` `2019` `2018` `2017`
  <chr>          <chr>    <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Asia           Exports  Regi… 1.71e5 1.74e5 1.47e5 1.16e5 1.15e5 1.04e5 85093.
2 Bangladesh     Exports  Coun… 9.86e2 9.04e2 6.75e2 5.82e2 6.42e2 6.64e2   548.
3 Brunei Daruss… Exports  Coun… 9.60e2 8.91e2 5.79e2 5.33e2 5.56e2 4.96e2   464.
4 Cambodia       Exports  Coun… 4.76e2 3.89e2 2.56e2 2.50e2 2.59e2 2.90e2   227.
5 Hong Kong      Exports  Coun… 2.17e4 2.13e4 1.81e4 1.51e4 1.13e4 1.05e4  8240.
6 India          Exports  Coun… 9.91e3 9.73e3 7.82e3 6.34e3 6.38e3 5.96e3  5326.
# ℹ 17 more variables: `2016` <dbl>, `2015` <dbl>, `2014` <dbl>, `2013` <dbl>,
#   `2012` <dbl>, `2011` <dbl>, `2010` <dbl>, `2009` <dbl>, `2008` <dbl>,
#   `2007` <dbl>, `2006` <dbl>, `2005` <dbl>, `2004` <dbl>, `2003` <dbl>,
#   `2002` <dbl>, `2001` <dbl>, `2000` <dbl>
Code
# library(openxlsx)
# 
# write.xlsx(final_data, "final_data.xlsx")

5.2.2 Revised visualization - pie chart [region]

Key makeover changes:

1️⃣ Breaking down trade data by types - e.g.: countries/ regions/ economic & political union

  • Original visualization grouped countries, regions, economic & political unions making trade relationships harder to interpret.

  • Revised visualization will categorize data into different types, ensuring clarity.

2️⃣ Pie chart for region-level aggregation:

  • Pie chart groups trade volumes by region –>Provide a high-level trade distribution which allow users to quickly assess which regions contribute the most to total trade.
Note
  • Easier comparison –> Highlights contributions from countries, regions, and unions seperately.
  • Pie chart only showcase the Major Trading Partners for Trade in Services 2023 - Total Trade Volume Distribution by Region

Code
library(ggplot2)
library(dplyr)
library(viridis)
library(ggrepel)

# Filter the trade data for regions
region_trade_data <- final_data %>%
  filter(types == "Region", category == "Total") %>%
  select(region = data_series, trade_value = `2023`)

# Ensure correct ordering based on trade value
region_trade_data <- region_trade_data %>%
  arrange(desc(trade_value))

# Compute cumulative sum for positioning
region_trade_data <- region_trade_data %>%
  mutate(label = paste0(region, "\n$", format(trade_value, big.mark = ",")),
         pos = cumsum(trade_value) - (0.5 * trade_value))  

# Create Static Pie Chart with Labels Outside
static_pie_chart <- ggplot(region_trade_data, aes(x = "", y = trade_value, fill = trade_value)) +
  geom_bar(stat = "identity", width = 1, color = "white") +  
  coord_polar(theta = "y") +
  scale_fill_viridis(option = "viridis", direction = -1) +  
  geom_text_repel(aes(y = pos, label = label), size = 3, color = "black", nudge_x = 1, box.padding = 0.5) +  
  theme_void() +
  labs(
    title = "Major Trading Partners for Trade in Services, 2023",
    subtitle = "Total Trade Volume Distribution by Region",
    fill = "Trade Value (SGD)"
  ) +
  theme(
    legend.position = "right",
    plot.title = element_text(hjust = 0, face = "bold", size = 14),  # Centered, bold title
    plot.subtitle = element_text(hjust = 0, size = 12)  # Centered subtitle
  )

# Show the static pie chart
print(static_pie_chart)

5.2.3 Revised visualization - worldmap [countries]

Key makeover changes:

1️⃣ Enhanced trade value representation with a geo-spatial context -

  • Original uses static infographics with flag-based indicators, the revised visualization presents an intecative world map, allowing for dynamic exploration.

  • Color gradient (from yellow to purple) effectively communciates the trade value inensity in SGD

Code
library(ggplot2)
library(dplyr)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(viridis)
library(plotly)

# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf")

# Filter the trade data (assuming it's already in your R environment as `final_data`)
trade_data <- final_data %>%
  filter(types == "Country", category == "Total") %>%
  select(region = data_series, trade_value = `2023`)

# Standardize region names
trade_data$region <- tolower(trimws(trade_data$region))
world$name <- tolower(trimws(world$name))

# Correct mismatched country names
trade_data$region <- recode(trade_data$region,
                             "mainland china" = "china",
                             "republic of korea" = "south korea",
                             "russian federation" = "russia",
                             "turkiye" = "turkey",
                             "british virgin islands" = "u.s. virgin is.",
                             "cayman islands" = "cayman is.",
                             "brunei darussalam" = "brunei",
                             "marshall islands" = "marshall is.")

# Merge world map with trade data
world <- world %>%
  left_join(trade_data, by = c("name" = "region"))

# Create a ggplot object with title and subtitle
ggplot_map <- ggplot(data = world) +
  geom_sf(aes(fill = trade_value, text = paste("Country:", name, "<br>Trade Value: $", trade_value)), 
          color = "black", size = 0.2) +
  scale_fill_viridis(option = "viridis", direction = -1, na.value = "gray90") +
  theme_minimal() +
  labs(
    title = "Major Trading Partners for Trade in Services, 2023",
    subtitle = "Total Trade Value Distribution by Countries",
    fill = "Trade Value (SGD)"
  ) +
  theme(
    plot.title = element_text(hjust = 0, face = "bold", size = 14), 
    plot.subtitle = element_text(hjust = 0, size = 12)  
  )

# Convert to interactive Plotly map and manually add subtitle
plotly_map <- ggplotly(ggplot_map, tooltip = "text") %>%
  layout(
    annotations = list(
      list(
        text = "<b>Total Trade Value Distribution by Countries</b>",
        x = 0,  # Left align
        y = 1.03,  # Position above plot
        xref = "paper",
        yref = "paper",
        showarrow = FALSE,
        font = list(size = 12)
      )
    )
  )

# Show interactive map
plotly_map

5.3 Time series analysis

Key observations:

  • Total trade vol()

    • Top 5 trade partners: United States, Japan, Mainland China, Australia, and United Kingdom

    • Trends: Significant growth post-2010, with accelerated trade activities from 2018 onward, with Asia and North America dominating trade volume.

  • Exports()

    • Top 5 export destinations: United States, Japan, Australia, Mainland China, and United Kingdom
  • Imports()

    • Top 5 import destinations: United States, Mainland China, Japan, United Kingdom, and Hong Kong.
Note
  • United States, Mainland China, Japan, and Australia consistently reman Singapore’s top trade partners in services
  • Exports and imports have surged post-2018, reinforcing Singapore’s role as a global trade hub
  • Digitalization and financial sector growth are key factors shaping the country trade landscape

Code
library(ggplot2)
library(dplyr)
library(tidyr)
library(viridis)

# Filter trade data for countries
trade_data <- final_data %>%
  filter(types == "Country", category == "Imports") %>%
  select(country = data_series, `2000`:`2023`)  # Select years from 2000 to 2023

# Reshape data from wide to long format
trade_data_long <- trade_data %>%
  pivot_longer(cols = `2000`:`2023`, names_to = "year", values_to = "trade_value") %>%
  mutate(year = as.numeric(year))  # Convert year to numeric

# Calculate total trade value per country and reorder factors (highest at the top)
trade_data_long <- trade_data_long %>%
  group_by(country) %>%
  mutate(total_trade_value = sum(trade_value, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(country = fct_reorder(country, total_trade_value, .desc = TRUE))  

# Create heatmap-style plot using geom_tile()
hori_plot <- ggplot(trade_data_long, aes(x = year, y = country, fill = trade_value)) +
  geom_tile(color = "white") +  # Add white borders between tiles
  scale_fill_viridis_c(option = "magma", na.value = "gray90") +  # Color gradient for intensity
  labs(
    title = "Total Trade Value Trends (2000-202f3)",
    subtitle = "Major Trading Partners for Trade in Services",
    x = "Year",
    y = "Country",
    fill = "Trade Value (SGD)"
  ) +
  theme_minimal() +
  theme(
    axis.text.y = element_text(size = 8),  # Adjust country label size
    axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate year labels
    plot.title = element_text(hjust = 0, face = "bold", size = 14),  # Left-aligned title
    plot.subtitle = element_text(hjust = 0, size = 12)  # Left-aligned subtitle
  )

# Show plot
print(hori_plot)

5.4 Time series forecast

In the below code, we will explore the tidymodels approach in time series forecasting.

5.4.1 Data sampling

Code
# Load all necessary packages
library(tidyverse)
library(modeltime)
library(timetk)
library(lubridate)
library(rsample)
library(parsnip)
library(recipes)
library(workflows)
library(yardstick)

# Convert wide format to long format
long_data <- final_data %>%
    pivot_longer(cols = matches("^\\d{4}$"), names_to = "year", values_to = "value") %>%
    mutate(year = as.integer(year)) 

# Convert year to Date format
long_data <- long_data %>%
    mutate(date = make_date(year = year, month = 1, day = 1)) %>%
    select(data_series, category, types, date, value)

# Filter for a specific country and only "Exports"
selected_series <- long_data %>%
    filter(data_series == "United States Of America", category == "Exports") %>%
    select(date, value) %>%
    arrange(date)

# Remove duplicates if any
selected_series <- selected_series %>%
    distinct(date, .keep_all = TRUE)

# Perform a time-based split (80% training, 20% testing)
splits <- initial_time_split(selected_series, prop = 0.8)

cat("The training dataset contains", nrow(training(splits)), "observations.\n")
The training dataset contains 19 observations.
Code
cat("The testing dataset consists of", nrow(testing(splits)), "observations.\n")
The testing dataset consists of 5 observations.

5.4.2 Create and fit multiple models

In the code below, we will fit four models: - Error-Trend-Season (ETS) model by using exp_smoothing() - Auto ARIMA model by using arima_reg() - Boosted Auto ARIMA by using arima_boost() - Prophet model by using prophet_reg()

Code
model_fit_ets <- exp_smoothing() %>%
    set_engine("ets") %>%
    fit(value ~ date, data = training(splits))

model_fit_arima <- arima_reg() %>%
    set_engine("auto_arima") %>%
    fit(value ~ date, data = training(splits))

model_fit_arima_boosted <- arima_boost(
  min_n = 2,
  learn_rate = 0.015) %>%
  set_engine("auto_arima_xgboost") %>%
  fit(value ~ date, data = training(splits))

model_fit_prophet <- prophet_reg() %>%
    set_engine("prophet") %>%
    fit(value ~ date, data = training(splits))

5.4.3 Add fitted models to a Model Table

Next, we will use modeltime_table of modeltime package to add each of the models to a Modeltime Table.

Code
models_tbl <- modeltime_table(
    model_fit_ets,
    model_fit_arima,
    model_fit_arima_boosted,
    model_fit_prophet
)

print(models_tbl)
# Modeltime Table
# A tibble: 4 × 3
  .model_id .model   .model_desc            
      <int> <list>   <chr>                  
1         1 <fit[+]> ETS(M,A,N)             
2         2 <fit[+]> ARIMA(0,1,0) WITH DRIFT
3         3 <fit[+]> ARIMA(0,1,0) WITH DRIFT
4         4 <fit[+]> PROPHET                

5.4.4 Calibrate the model to a testing set

We will then use the modeltime_calibrate() to add a new column called .calibrate_data into the newly created models_tbl tibble data table.

Code
calibration_tbl <- models_tbl %>%
    modeltime_calibrate(new_data = testing(splits))

print(calibration_tbl)
# Modeltime Table
# A tibble: 4 × 5
  .model_id .model   .model_desc             .type .calibration_data
      <int> <list>   <chr>                   <chr> <list>           
1         1 <fit[+]> ETS(M,A,N)              Test  <tibble [5 × 4]> 
2         2 <fit[+]> ARIMA(0,1,0) WITH DRIFT Test  <tibble [5 × 4]> 
3         3 <fit[+]> ARIMA(0,1,0) WITH DRIFT Test  <tibble [5 × 4]> 
4         4 <fit[+]> PROPHET                 Test  <tibble [5 × 4]> 

5.4.5 Model accuracy assessment

We will use two way to assess the accuracy of the models - by (1) means of accuracy metrics, (2) visualization

5.4.5.1 Means of accuracy metrics

modeltime_accuracy() of modeltime package is used compute the accuracy metrics. Then, table_modeltime_accuracy() is used to present the accuracy metrics in tabular form.

Code
calibration_tbl %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ETS(M,A,N) Test 16181.62 31.88 1.49 40.03 19413.03 0.73
2 ARIMA(0,1,0) WITH DRIFT Test 13059.38 24.56 1.20 30.18 16738.06 0.73
3 ARIMA(0,1,0) WITH DRIFT Test 13059.38 24.56 1.20 30.18 16738.06 0.73
4 PROPHET Test 16031.61 31.46 1.48 39.53 19329.48 0.61
5.4.5.2 Visualization

We can also use the interactive plotly visualization to assess the accuracy of the models.

Code
calibration_tbl %>%
    modeltime_forecast(new_data = testing(splits), actual_data = selected_series) %>%
    plot_modeltime_forecast()

5.4.6 Refit to full dataset & forecast forward

Next, we refit the models to the full dataset using modeltime_refit() and forecast them forward.We can use modeltime_refit() to refit the forecasting models with the full data.

Then, modeltime_forecast() is used to forecast to a selected future time period, in this example 10 years.

Code
refit_tbl <- calibration_tbl %>%
    modeltime_refit(data = selected_series)  # Now trained on full dataset

forecast_tbl <- refit_tbl %>%
    modeltime_forecast(
        h = 10,  # Forecast for 10 years
        actual_data = selected_series,
        keep_data = TRUE  # Keep historical data for better visualization
    )
forecast_tbl %>%
    plot_modeltime_forecast(
        .legend_max_width = 25, 
        .interactive = TRUE,
        .plotly_slider = TRUE
    )

5.4.7 Examine the model accuracy metrics after refitting the full data set

Code
refit_tbl %>%
    modeltime_accuracy() %>%
    table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ETS(M,A,N) Test 16181.62 31.88 1.49 40.03 19413.03 0.73
2 UPDATE: ARIMA(0,1,0)(0,0,1)[5] WITH DRIFT Test 13059.38 24.56 1.20 30.18 16738.06 0.73
3 UPDATE: ARIMA(0,1,0)(0,0,1)[5] WITH DRIFT Test 13059.38 24.56 1.20 30.18 16738.06 0.73
4 PROPHET Test 16031.61 31.46 1.48 39.53 19329.48 0.61

5.4.8 Takeaway from model accuracy metrics

  • ARIMA(0,1,0)(0,0,1)[5] WITH DRIFT is the best-performing model in this comparison, as it has the lowest error metrics and the highest R².
  • ETS(M,A,N) performed the worst, with the highest error values.
  • PROPHET had moderate performance but lower R², making it less reliable than ARIMA.